Compositions for the diagnosis and treatment of chediak-higashi syndrome

ABSTRACT

The present invention relates to the identification of novel nucleic acid molecules and proteins encoded by such nucleic acid molecules or degenerate variants thereof, that participate in the differentiation and/or function of intracellular vesicles. The nucleic acid molecules of the present invention represent the genes corresponding to the mammalian bg gene, a gene that, when mutated, is responsible for the human Chediak-Higashi syndrome.

[0001] This application claims priority under 35 U.S.C. §119(e) to U.S.provisonal application Serial No. 60/021,064, filed Jul. 1, 1996, U.S.provisonal application Serial No. 60/015,673, filed Apr. 19, 1996 andU.S. provisonal application Serial No. 60/013,883, filed Mar. 22, 1996,each of which is incorporated herein by reference in its entirety.

1. INTRODUCTION

[0002] The present invention relates to the identification of novelnucleic acid molecules and proteins encoded by such nucleic acidmolecules or degenerate, especially naturally occurring, variantsthereof, that, when mutated, lead to disorders involving abnormalintracellular vesicles, especially abnormal lysosomes, melanosomes,platelet dense granules and cytolytic granules, includingChediak-Higashi syndrome (CHS). The nucleic acid molecules of thepresent invention represent the genes corresponding to the mammalian bggene, including the human bg gene, which are involved in the normaldifferentiation and/or function of such intracellular vesicles. Nucleicacid molecules representing loss-of-function alleles of the human bggene bring about Chediak-Higashi syndrome (CHS), in individualshomozygous for such alleles.

[0003] In particular, the compositions of the present invention includenucleic acid molecules (e.g., bg gene), including recombinant DNAmolecules, cloned genes or degenerate, especially naturally occurring,variants thereof, which encode novel bg gene products, and antibodiesdirected against such bg gene products or conserved variants orfragments thereof. The compositions of the present inventionadditionally include cloning vectors, including expression vectors,containing the nucleic acid molecules of the invention and hosts whichhave been transformed with such nucleic acid molecules.

[0004] In addition, this invention presents methods for the diagnosticevaluation and prognosis of disorders involving abnormal intracellularvesicles, especially abnormal lysosomes, melanosomes, platelet densegranules and cytolytic granules, including CHS, and for theidentification of subjects having a predisposition to such conditions.For example, nucleic acid molecules of the invention can be used asdiagnostic hybridization probes or as primers for diagnostic PCRanalysis for the identification of bg gene mutations, allelic variationsand regulatory defects in the bg gene.

[0005] Further, methods and compositions are presented for the treatmentof disorders involving abnormal intracellular vesicles, especiallyabnormal lysosomes, melanosomes, platelet dense granules and cytolyticgranules, including CHS. Such methods and compositions are capable ofmodulating the level of bg gene expression and/or the level of bg geneproduct activity.

[0006] Still further, the present invention relates to methods for theuse of the bg gene, bg gene products and/or cells expressing wild typeor mutant bg gene sequences for the identification of compounds whichmodulate bg gene expression and/or the activity of bg gene products.Such compounds can be used as agents to control disorders involvingabnormal intracellular vesicles, especially abnormal lysosomes,melanosomes, platelet dense granules and cytolytic granules, inparticular, therapeutic agents in the treatment of CHS.

2. BACKGROUND OF THE INVENTION

[0007] Chediak-Higashi syndrome (CHS) is a lethal autosomal recessivedisorder of humans mapping to 1q43. The clinical manifestations of thisdisorder include hypopigmentation, defective immune cell function,including severely impaired natural killer cell activity, and defectiveantibody-dependent, lymphocyte-mediated cytolysis against tumor celltargets. Further, neural degeneration is observed and, finally, theoccurrence of a mononuclear cell lymphoma develops, which causes thedeath of afflicted individuals.

[0008] As mentioned above, the disease is accompanied by a markedsusceptibility to infections. Young children have repeated infections,usually with gram-positive organisms of the staphylococcal andstreptococcal type. Further, during the course of the disease, childrenmay develop a progressive peripheral neuropathy. Children surviving theearly infectious episodes (8-18 years of age), most frequently developterminal lymphoreticular malignancy. Few patients survive beyond twentyyears.

[0009] Pathological manifestation of the syndrome includes enlargedvesicles affecting lysosomes, melanosomes, platelet dense granules,cytolytic granules and Schwann cell granules. The abnormal size of thesevesicles is thought to result from a malregulation of vesicle fusion orfission. Abnormal membrane-bound lysosomal-like organelles have beenfound in cells of the buccal mucosa, Schwann cells, pancreas, liver,gastric and duodenal mucosa, adrenal, pituitary, spleen, kidney, bonemarrow, hair skin, iris and conjunctiva. The giant granules observedresemble the normal granules of the specific cell type in both finestructure and cytochemic reactions and result from the fusion of smallprimary granules.

[0010] Similar phenotypes are found in other species, most notably thebeige mouse and the Aleutian mink, but are also found in such species asthe Persian cat, cattle and even the killer whale. Somatic cell fusionstudies have suggested that mutations within the same gene in mouse,mink, and man were responsible for the CHS-like phenotype in each ofthese species. In mice, the gene responsible for such a phenotype is thebeige (bg) gene. Such studies, however, were not able to elucidateeither the function or the identity of the bg gene product.

[0011] Over the past thirty years numerous theories have been evoked toexplain the nature of these disorders. For example, it has beensuggested that the defect might be caused by alterations in membranefluidity, defects in microtubules or microtubule associated proteins, orchanges in cyclic nucleotides levels. Upon further examination, though,each of these theories has been found to be inadequate, thushighlighting the fact that a great need remains for the discovery of thecausative agent of the lethal Chediak-Higashi syndrome genetic disorder.

3. SUMMARY OF THE INVENTION

[0012] The present invention relates to the identification of novelnucleic acid molecules and proteins encoded by such nucleic acidmolecules or degenerate, especially naturally occurring, variantsthereof, that, when mutated, lead to disorders involving abnormalintracellular vesicles, especially abnormal lysosomes, melanosomes,platelet dense granules and cytolytic granules, includingChediak-Higashi syndrome (CHS). The nucleic acid molecules of thepresent invention represent the genes corresponding to the mammalian bggene, including the human bg gene, which are involved in the normaldifferentiation and/or function of such intracellular vesicles. Nucleicacid molecules representing loss-of-function alleles of the human bggene bring about Chediak-Higashi syndrome (CHS), in individualshomozygous for such alleles.

[0013] In particular, the compositions of the present invention includenucleic acid molecules (e.g., bg gene), including recombinant DNAmolecules, cloned genes or degenerate, especially naturally occurring,variants thereof, which encode novel bg gene products, and antibodiesdirected against such bg gene products or conserved variants orfragments thereof. The compositions of the present inventionadditionally include cloning vectors, including expression vectors,containing the nucleic acid molecules of the invention and hosts whichhave been transformed with such nucleic acid molecules.

[0014] Nucleic acid sequences of wild type and mutant forms of themurine bg gene are provided. Wild type murine bg gene produces atranscript of approximately 12-14 kb. The amino acid sequence of thepredicted bg gene product indicates that the protein is novel.

[0015] Nucleic acid sequences of wild type forms of the human bg geneare also provided. The human bg gene produces alternatively splicedtranscripts. The long, putatively full length bg transcript encodes a bgprotein of 3801 amino acid residues, as shown in FIG. 7. A short form,alternatively spliced, human bg transcript encodes a bg protein of 3672amino acid residues, as shown in FIG. 8. The amino acid sequence of thepredicted human bg gene products indicates that the proteins are novel.

[0016] In addition, this invention presents methods for the diagnosticevaluation and prognosis of disorders involving abnormal intracellularvesicles, especially abnormal lysosomes, melanosomes, platelet densegranules and cytolytic granules, including CHS, and for theidentification of subjects having a predisposition to such conditions.For example, nucleic acid molecules of the invention can be used asdiagnostic hybridization probes or as primers for diagnostic PCRanalysis for the identification of bg gene mutations, allelic variationsand regulatory defects in the bg gene.

[0017] Further, methods and compositions are presented for the treatmentof disorders involving abnormal intracellular vesicles, especiallyabnormal lysosomes, melanosomes, platelet dense granules and cytolyticgranules, including CHS. Such methods and compositions are capable ofmodulating the level of bg gene expression and/or the level of bg geneproduct activity.

[0018] Still further, the present invention relates to methods for theuse of the bg gene, bg gene products and/or cells expressing wild typeor mutant bg gene sequences for the identification of compounds whichmodulate bg gene expression and/or the activity of bg gene products.Such compounds can be used as agents to control disorders involvingabnormal intracellular vesicles, especially abnormal lysosomes,melanosomes, platelet dense granules and cytolytic granules, inparticular, therapeutic agents in the treatment of CHS.

[0019] This invention is based, in part, on a combination of in vitrocomplementation using yeast artificial chromosomes (YACs), positionalcloning techniques and mutation detection which, together, were used tosuccessfully identify and clone the murine bg gene, as described in theExamples, below, presented in Sections 6-9. Such analyses included theidentification and sequencing of two independent bg mutations, one aninsertion of 117 base pairs and the other a point mutation which resultsin an in-frame, premature stop codon. Both mutations result in theproduct of transcripts encoding truncated BG proteins.

4. DESCRIPTION OF THE FIGURES

[0020]FIG. 1. Genetic and physical map of mouse chromosome 13 regioncontaining the bg gene interval.

[0021] FIGS. 2A-2B. Diagram depicting yeast artificial chromosomes(YACs) spanning the minimal bg interval. Ability of YACs to complementthe bg mutation is noted.

[0022]FIG. 3A. Wild type and bg mouse fibroblasts plated together todemonstrate differences in phenotypes between the two cell types. Thearrows denote two wild type cells. Two bg cells are just below theindicated wild type cells. Note the difference in lysosome size anddistribution. Magnification was approximately 500×.

[0023]FIG. 3B. The initial mixed isolate of complemented colony 195-4,isolated from 400 μg/ml G418. This colony, as isolated from the platecontained complemented and uncomplemented bg cells. Magnification wasapproximately 500×.

[0024]FIG. 3C. Colony 195-4 after 10 days in 800 μg/ml G418. Note thatthe colony after this period of time was homogeneously complemented(i.e., all of the bg cells appeared wild type with respect to thelysosomal morphology.) Magnification was approximately 500×.

[0025]FIG. 3D. Colony 195-4 after culture i 800 μg/ml G418 for ten days,then cultured without G418 for thirty days. The result illustrated heredemonstrates that the YAC was responsible for the complementation, inthat, when the cells were cultured without G418, they lost the YAC andreverted back to the mutant bg morphology. Magnification wasapproximately 500×.

[0026]FIG. 4. Nucleotide sequence (bottom line; SEQ ID NO:1) and aminoacid sequence (top line; SEQ ID NO:2) of the 22B/30B gene (the murine bggene).

[0027]FIG. 5A-5D. Southern blot analysis of a chromosomal rearrangementassociated with the bg allele. Southern blot analysis of a 510 bpfragment of 22B/30B hybridized to lane (1) C57BL/6J; (2) C57BL/6J-bg;(3) C57BL/6J-bg^(J); (4) C57BL/6J-bg^(10J); (5) C57BL/6J-bg^(11J); (6)C3H/HeJ; (7) C3H/H3J-bg^(2J); (8) DBA/2J; and (9)DBA/2J-CO-bg^(8J) DNAsdigested with 5A: HindIII; 5B: PstI; 5C: BglII; and 5D: TaqI. Sizemarkers are indicated.

[0028]FIG. 6. Diagram illustrating the location and structure of a Line1 insertion representing mutation within the bg gene yielding truncatedBG proteins which leads to a mutant bg phenotype.

[0029]FIG. 7. Human long form (putative full length) bg gene nucleotide(bottom line) and derived amino acid (top line) sequences.

[0030]FIG. 8. Human short form, alternatively spliced bg gene nucleotide(bottom line) and derived amino acid (top line) sequences.

5. DETAILED DESCRIPTION OF THE INVENTION

[0031] Described herein are novel mammalian genes, the beige (bg) genes,including the human bg gene. Such genes are involved in the normaldifferentiation and/or function of intracellular vesicles. When suchsequences are mutated such that, for example, a functional beige geneproduct (BG) is no longer produced, disorders develop involving abnormalintracellular vesicles, especially abnormal lysosomes, melanosomes,platelet dense granules and cytolytic granules, includingChediak-Higashi syndrome. Also described are recombinant mammalian,including human, bg DNA molecules, cloned genes, or degenerate variantsthereof. The compositions of the present invention further include bygene products (e.g., proteins) that are encoded by the bg gene, and themodulation of bg gene expression and/or bg gene product activity in thetreatment of disorders involving abnormal intracellular vesicles,including, but not limited to CHS. Also described herein are antibodiesagainst bg gene products (e.g., proteins), or conserved variants orfragments thereof, and nucleic acid probes useful for the identificationof bg gene mutations and the use of such nucleic acid probes in, forexample, the identification of individuals predisposed to such disordersand/or individuals who carry mutant bg alleles. Further described aremethods for the use of the bg gene and/or bg gene products in theidentification of compounds which modulate the activity of the bg geneproduct.

[0032] Murine bg nucleic acid and amino acid compositions of theinvention are demonstrated in the Examples presented, below, in Sections6 through 9. A gene, referred to herein as the 22B/30B gene,representing a candidate for the murine bg gene was identified via acombination of genetic and physical mapping coupled with a yeastartificial chromosome (YAC) complementation assay by whichcomplementation of the bg mutation was assessed via analysis of themorphological phenotype of YAC-transformed bg fibroblasts.Identification and sequencing of two independent bg mutations revealedthat the mutations resided within the 22B/30B gene, representingcompelling evidence that the 22B/30B gene was the bg gene. For clarity,it should, therefore, be noted that the murine bg gene is also referredto herein as the 22B/30B gene.

[0033] Human bg nucleic acid and amino acid compositions of theinvention are demonstrated in Example 10, below.

5.1. The bg Gene

[0034] The bg gene, murine nucleic acid sequence of which is shown inFIG. 4 (SEQ ID NO:1) and human nucleic acid sequences of which are shownin FIGS. 7 and 8, is a novel gene involved in the normal differentiationand/or function of intracellular vesicles. Nucleic acid sequences of thebg gene are described herein. As used herein, “bg gene” refers to (a) agene containing the DNA sequence shown in FIG. 4, FIG. 7 or FIG. 8; (b)any DNA sequence that encodes the amino acid sequence shown in FIG. 4(SEQ ID NO:2), FIG. 7 or FIG. 8; (c) any DNA sequence that hybridizes tothe complement of the DNA sequences that encode the amino acid sequenceshown in FIG. 4, FIG. 7 or FIG. 8, under highly stringent conditions,e.g., hybridization to filter-bound DNA in 0.5 M NaHPO₄, 7% sodiumdodecyl sulfate (SDS), 1 mM EDTA at 65° C., and washing in 0.1×SSC/0.1%SDS at 68° C. (Ausubel F. M. et al., eds., 1989, Current Protocols inMolecular Biology, Vol. I, Green Publishing Associates, Inc., and JohnWiley & sons, Inc., New York, at p. 2.10.3); and/or (d) any DNA sequencethat hybridizes to the complement of the DNA sequences that encode theamino acid sequence shown in FIG. 4, FIG. 7 or FIG. 8, under lessstringent conditions, such as moderately stringent conditions, e.g.,washing in 0.2×SSC/0.1% SDS at 42° C. (Ausubel et al., 1989, supra), yetwhich still encodes a functional bg gene product. As used herein, bggene may also refer to degenerate variants of DNA sequences (a) through(d), including naturally occurring variants. The term “functional bggene product,” as used herein, refers to a gene product encoded by anucleic acid sequence capable of complementing a recessive,loss-of-function bg mutation.

[0035] The invention also includes nucleic acid molecules, preferablyDNA molecules, that hybridize to, and are therefore the complements of,the DNA sequences (a) through (d), in the preceding paragraph, and todegenerate variants of the DNA sequences shown in (a) through (d) in thereceding paragraph. Hybridization conditions may be highly stringent orless highly stringent, as described above. In instances wherein thenucleic acid molecules are deoxyoligonucleotides (“oligos”), highlystringent conditions may refer, e.g., to washing in 6×SSC/0.05% sodiumpyrophosphate at 37° C. (for 14-base oligos), 48° C. (for 17-baseoligos), 55° C. (for 20-base oligos), and 60° C. (for 23-base oligos).These nucleic acid molecules may encode or act as bg gene antisensemolecules, useful, for example, in bg gene regulation, as antisenseprimers in amplification reactions of bg gene nucleic acid sequencesand/or as hybridization probes for the identification of bg nucleic acidsequences. With respect to diagnostic procedures, such molecules may beused as components of methods whereby, for example, the presence of aparticular bg allele responsible for causing a disorder, such as CHS,may be detected.

[0036] The invention also encompasses (a) DNA vectors that contain anyof the foregoing bg coding sequences and/or their complements (i.e.,antisense); (b) DNA expression vectors that contain any of the foregoingbg coding sequences operatively associated with a regulatory elementthat directs the expression of the coding sequences; and (c) geneticallyengineered host cells that contain any of the foregoing bg codingsequences operatively associated with a regulatory element that directsthe expression of the coding sequences in the host cell. As used herein,regulatory elements include but are not limited to inducible andnon-inducible promoters, enhancers, operators and other elements knownto those skilled in the art that drive and regulate expression. Suchregulatory elements include but are not limited to the cytomegalovirushCMV immediate early gene, the early or late promoters of SV40adenovirus, the lac system, the trp system, the TAC system, the TRCsystem, the major operator and promoter regions of phage A, the controlregions of fd coat protein, the promoter for 3-phosphoglycerate kinase,the promoters of acid phosphatase, and the promoters of the yeastα-mating factors. The invention includes fragments of any of the DNAsequences disclosed herein.

[0037] bg gene sequences include, for example, alleles and homologs ofgenes containing the sequence depicted in FIG. 4, FIG. 7 or FIG. 8,wherein such alleles are present at the same locus as the sequencedepicted in FIG. 4, FIG. 7 or FIG. 8 and homologs are genes at othergenetic loci within the genome that encode proteins which have extensivehomology to one or more domains of the bg gene product. Such bg genealleles and homologs can be identified and readily isolated, withoutundue experimentation, by molecular biological techniques well known inthe art.

[0038] As an example, in order to clone a human bg gene sequence usingisolated murine bg gene sequences as disclosed herein, such murine bggene sequences may be labeled and used to screen a cDNA libraryconstructed from mRNA obtained from appropriate cells or tissues ofinterest (e.g., a cell or tissue known to express the bg gene in mouse,and/or a cell or tissue known to be affected by CHS in humans, such as,for example, a retinal library). The hybridization washing conditionsused should normally be of a lower stringency when the cDNA library isderived from an organism different from the type of organism from whichthe labeled sequence was derived, but appropriate stringency conditionsfor the specific sequence and library being utilized will be apparent tothose of skill in the art.

[0039] Low stringency conditions, for example, are well known to thoseof skill in the art, and will vary predictably depending on the specificorganisms from which the library and the labeled sequences are derived.For guidance regarding such conditions see, for example, Sambrook etal., 1989, Molecular Cloning, A Laboratory Manual, Cold Springs HarborPress, N.Y.; and Ausubel et al., 1989, Current Protocols in MolecularBiology, Green Publishing Associates and Wiley Interscience, N.Y.

[0040] Alternatively, the labeled fragment may be used to screen agenomic library derived from the organism of interest, again, usingappropriately stringent conditions. Such a screening procedure could beutilized, for example, to identify either bg alleles or bg homolog geneslocated in different portions of the genome containing sequencesencoding one or more domains exhibiting extensive homology to one ormore domains encoded by the bg gene.

[0041] Further, a bg gene sequence may be isolated from nucleic acid ofthe organism of interest by performing PCR using two degenerateoligonucleotide primer pools designed on the basis of amino acidsequences within the bg gene product disclosed herein. The template forthe reaction may be cDNA obtained by reverse transcription of mRNAprepared from, for example, human or non-human cell lines or tissueknown or suspected to express a bg gene allele.

[0042] The PCR product may be subcloned and sequenced to ensure that theamplified sequences represent the sequences of a bg gene nucleic acidsequence. The PCR fragment may then be used to isolate a full lengthcDNA clone by a variety of methods. For example, the amplified fragmentmay be labeled and used to screen a cDNA library, such as abacteriophage cDNA library. Alternatively, the labeled fragment may beused to isolate genomic clones via the screening of a genomic library.

[0043] PCR technology may also be utilized to isolate full length cDNAsequences. For example, RNA may be isolated, following standardprocedures, from an appropriate cellular or tissue source (i.e., oneknown, or suspected, to express the bg gene, and/or one known to beaffected by disorders caused by bg mutations). A reverse transcriptionreaction may be performed on the RNA using an oligonucleotide primerspecific for the most 5′ end of the amplified fragment for the primingof first strand synthesis. The resulting RNA/DNA hybrid may then be“tailed” with guanines using a standard terminal transferase reaction,the hybrid may be digested with RNAase H, and second strand synthesismay then be primed with a poly-C primer. Thus, cDNA sequences upstreamof the amplified fragment may easily be isolated. For a review ofcloning strategies which may be used, see e.g., Sambrook et al., 1989,supra.

[0044] bg gene sequences may additionally be used to isolate mutant bggene alleles. Such mutant alleles may be isolated from individualseither known or proposed to have a genotype which contributes to thesymptoms of intracellular vesicle disorders, including CHS. Mutantalleles and mutant allele products may then be utilized in thetherapeutic and diagnostic systems described below. Additionally, suchbg gene sequences can be used to detect bg gene regulatory (e.g.,promoter) defects which can affect intracellular vesicle differentiationand/or function.

[0045] A cDNA of a mutant bg gene may be isolated, for example, by usingPCR, a technique which is well known to those of skill in the art. Inthis case, the first cDNA strand may be synthesized by hybridizing anoligo-dT oligonucleotide to mRNA isolated from tissue known or suspectedto be expressed in an individual putatively carrying the mutant bgallele, and by extending the new strand with reverse transcriptase. Thesecond strand of the cDNA is then synthesized using an oligonucleotidethat hybridizes specifically to the 5′ end of the normal gene. Usingthese two primers, the product is then amplified via PCR, cloned into asuitable vector, and subjected to DNA sequence analysis through methodswell known to those of skill in the art. By comparing the DNA sequenceof the mutant bg allele to that of the normal bg allele, the mutation(s)responsible for the loss or alteration of function of the mutant bg geneproduct can be ascertained.

[0046] Alternatively, a genomic library can be constructed using DNAobtained from an individual suspected of or known to carry the mutant bgallele, or a cDNA library can be constructed using RNA from a tissueknown, or suspected, to express the mutant bg allele. The normal bg geneor any suitable fragment thereof may then be labeled and used as a probeto identify the corresponding mutant bg allele in such libraries. Clonescontaining the mutant bg gene sequences may then be purified andsubjected to sequence analysis according to methods well known to thoseof skill in the art.

[0047] Additionally, an expression library can be constructed utilizingcDNA synthesized from, for example, RNA isolated from a tissue known, orsuspected, to express a mutant bg allele in an individual suspected ofor known to carry such a mutant allele. In this manner, gene productsmade by the putatively mutant tissue may be expressed and screened usingstandard antibody screening techniques in conjunction with antibodiesraised against the normal bg gene product, as described, below, inSection 5.3. (For screening techniques, see, for example, Harlow, E. andLane, eds., 1988, “Antibodies: A Laboratory Manual”, Cold Spring HarborPress, Cold Spring Harbor.) In cases where a bg mutation results in anexpressed gene product with altered function (e.g., as a result of amissense or a frameshift mutation), a polyclonal set of anti-bg geneproduct antibodies are likely to cross-react with the mutant bg geneproduct. Library clones detected via their reaction with such labeledantibodies can be purified and subjected to sequence analysis accordingto methods well known to those of skill in the art.

[0048] The Example presented in Section 9, below, demonstrates thesuccessful isolation and sequencing of two bg mutations, each of whichcauses the production of truncated, non-functional BG proteins.

5.2. Protein Products of the bg GENE

[0049] bg gene products, peptide fragments thereof or fusion proeins,can be prepared for a variety of uses. For example, such gene products,peptide fragments thereof or fusion proteins, can be used for thegeneration of antibodies, in diagnostic assays, or for theidentification of other cellular gene products involved in thedifferentiation and/or function of intracellular vesicles.

[0050]FIG. 4 depicts murine bg gene product amino acid sequence. FIG. 7depicts the long form, putative full length, human bg gene product aminoacid sequence. As shown in FIG. 7, the long form human bg gene productcontains 3801 amino acid residues. FIG. 8 depicts the short form humanbg gene product encoded by an alternatively spliced short form of bgtranscript. As shown in FIG. 8, the human bg gene product encoded bythis short form transcript contains 3672 amino acid residues. The bggene product, sometimes referred to herein as “BG”, may additionallyinclude those gene products encoded by the bg gene sequences describedin Section 5.1, above.

[0051] In addition, bg gene products may include proteins that representfunctionally equivalent bg gene products. The term “functionallyequivalent bg gene product”, as used herein, refers to a gene productencoded by a nucleic acid sequence capable of complementing a bgmutation. Such an equivalent bg gene product may contain deletions,additions or substitutions of amino acid residues within the amino acidsequence encoded by the bg gene sequences described, above, in Section5.1, but which result in a silent change, thus producing a functionallyequivalent bg gene product. Amino acid substitutions may be made on thebasis of similarity in polarity, charge, solubility, hydrophobicity,hydrophilicity, and/or the amphipathic nature of the residues involved.For example, nonpolar (hydrophobic) amino acids include alanine,leucine, isoleucine, valine, proline, phenylalanine, tryptophan, andmethionine; polar neutral amino acids include glycine, serine,threonine, cysteine, tyrosine, asparagine, and glutamine; positivelycharged (basic) amino acids include arginine, lysine, and histidine; andnegatively charged (acidic) amino acids include-aspartic acid andglutamic acid.

[0052] The bg gene products, peptide fragments thereof or fusionproteins, may be produced by recombinant DNA technology using techniqueswell known in the art. Thus, methods for preparing the bg genepolypeptides, peptides and fusion proteins of the invention byexpressing nucleic acid containing bg gene sequences are describedherein. Methods which are well known to those skilled in the art can beused to construct expression vectors containing bg gene product codingsequences and appropriate transcriptional and translational controlsignals. These methods include, for example, in vitro recombinant DNAtechniques, synthetic techniques, and in vivo genetic recombination.See, for example, the techniques described in Sambrook et al., 1989,supra, and Ausubel et al., 1989, supra. Alternatively, RNA capable ofencoding bg gene product sequences may be chemically synthesized using,for example, synthesizers. See, for example, the techniques described in“Oligonucleotide Synthesis”, 1984, Gait, M. J. ed., IRL Press, Oxford,which is incorporated by reference herein in its entirety.

[0053] A variety of host-expression vector systems may be utilized toexpress the bg gene coding sequences of the invention. Suchhost-expression systems represent vehicles by which the coding sequencesof interest may be produced and subsequently purified, but alsorepresent cells which may, when transformed or transfected with theappropriate nucleotide coding sequences, exhibit the bg gene product ofthe invention in situ. These include but are not limited tomicroorganisms such as bacteria (e.g., E. coli, B. subtilis) transformedwith recombinant bacteriophage DNA, plasmid DNA or cosmid DNA expressionvectors containing bg gene product coding sequences; yeast (e.g.,Saccharomyces, Pichia) transformed with recombinant yeast expressionvectors containing the bg gene product coding sequences; insect cellsystems infected with recombinant virus expression vectors (e.g.,baculovirus) containing the bg gene product coding sequences; plant cellsystems infected with recombinant virus expression vectors (e.g.,cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) ortransformed with recombinant plasmid expression vectors (e.g., Tiplasmid) containing bg gene product coding sequences; or mammalian cellsystems (e.g., COS, CHO, BHK, 293, 3T3) harboring recombinant expressionconstructs containing promoters derived from the genome of mammaliancells (e.g., metallothionein promoter) or from mammalian viruses (e.g.,the adenovirus late promoter; the vaccinia virus 7.5K promoter).

[0054] In bacterial systems, a number of expression vectors may beadvantageously selected depending upon the use intended for the bg geneproduct being expressed. For example, when a large quantity of such aprotein is to be produced, for the generation of pharmaceuticalcompositions of bg protein or for raising antibodies to bg protein, forexample, vectors which direct the expression of high levels of fusionprotein products that are readily purified may be desirable. Suchvectors include, but are not limited, to the E. coli expression vectorpUR278 (Ruther et al., 1983, EMBO J. 2:1791), in which the bg geneproduct coding sequence may be ligated individually into the vector inframe with the lac Z coding region so that a fusion protein is produced;pIN vectors (Inouye & Inouye, 1985, Nucleic Acids Res. 13:3101-3109; VanHeeke & Schuster, 1989, J. Biol. Chem. 264:5503-5509); and the like.pGEX vectors may also be used to express foreign polypeptides as fusionproteins with glutathione S-transferase (GST). In general, such fusionproteins are soluble and can easily be purified from lysed cells byadsorption to glutathione-agarose beads followed by elution in thepresence of free glutathione. The pGEX vectors are designed to includethrombin or factor Xa protease cleavage sites so that the cloned targetgene product can be released from the GST moiety.

[0055] In an insect system, Autographa californica nuclear polyhedrosisvirus (AcNPV) is used as a vector to express foreign genes. The virusgrows in Spodoptera frugiperda cells. The bg gene coding sequence may becloned individually into non-essential regions (for example thepolyhedrin gene) of the virus and placed under control of an AcNPVpromoter (for example the polyhedrin promoter). Successful insertion ofbg gene coding sequence will result in inactivation of the polyhedringene and production of non-occluded recombinant virus (i.e., viruslacking the proteinaceous coat coded for by the polyhedrin gene). Theserecombinant viruses are then used to infect Spodoptera frugiperda cellsin which the inserted gene is expressed. (E.g., see Smith et al., 1983,J. Virol. 46: 584; Smith, U.S. Pat. No. 4,215,051).

[0056] In mammalian host cells, a number of viral-based expressionsystems may be utilized. In cases where an adenovirus is used as anexpression vector, the bg gene coding sequence of interest may beligated to an adenovirus transcription/translation control complex,e.g., the late promoter and tripartite leader sequence. This chimericgene may then be inserted in the adenovirus genome by in vitro or invivo recombination. Insertion in a non-essential region of the viralgenome (e.g., region E1 or E3) will result in a recombinant virus thatis viable and capable of expressing bg gene product in infected hosts.(E.g., See Logan & Shenk, 1984, Proc. Natl. Acad. Sci. USA81:3655-3659). Specific initiation signals may also be required forefficient translation of inserted bg gene product coding sequences.These signals include the ATG initiation codon and adjacent sequences.In cases where an entire bg gene, including its own initiation codon andadjacent sequences, is inserted into the appropriate expression vector,no additional translational control signals may be needed. However, incases where only a portion of the bg gene coding sequence is inserted,exogenous translational control signals, including, perhaps, the ATGinitiation codon, must be provided. Furthermore, the initiation codonmust be in phase with the reading frame of the desired coding sequenceto ensure translation of the entire insert. These exogenoustranslational control signals and initiation codons can be of a varietyof origins, both natural and synthetic. The efficiency of expression maybe enhanced by the inclusion of appropriate transcription enhancerelements, transcription terminators, etc. (see Bittner et al., 1987,Methods in Enzymol. 153:516-544).

[0057] In addition, a host cell strain may be chosen which modulates theexpression of the inserted sequences, or modifies and processes the geneproduct in the specific fashion desired. Such modifications (e.g.,glycosylation) and processing (e.g., cleavage) of protein products maybe important for the function of the protein. Different host cells havecharacteristic and specific mechanisms for the post-translationalprocessing and modification of proteins and gene products. Appropriatecell lines or host systems can be chosen to ensure the correctmodification and processing of the foreign protein expressed. To thisend, eukaryotic host cells which possess the cellular machinery forproper processing of the primary transcript, glycosylation, andphosphorylation of the gene product may be used. Such mammalian hostcells include but are not limited to CHO, VERO, BHK, HeLa, COS, MDCK,293, 3T3 and WI38 cell lines.

[0058] For long-term, high-yield production of recombinant proteins,stable expression is preferred. For example, cell lines which stablyexpress the bg gene product may be engineered. Rather than usingexpression vectors which contain viral origins of replication, hostcells can be transformed with DNA controlled by appropriate expressioncontrol elements (e.g., promoter, enhancer, sequences, transcriptionterminators, polyadenylation sites, etc.), and a selectable marker.Following the introduction of the foreign DNA, engineered cells may beallowed to grow for 1-2 days in an enriched media, and then are switchedto a selective media. The selectable marker in the recombinant plasmidconfers resistance to the selection and allows cells to stably integratethe plasmid into their chromosomes and grow to form foci which in turncan be cloned and expanded into cell lines. This method mayadvantageously be used to engineer cell lines which express the bg geneproduct. Such engineered cell lines may be particularly useful inscreening and evaluation of compounds that affect the endogenousactivity of the bg gene product.

[0059] A number of selection systems may be used, including but notlimited to the herpes simplex virus thymidine kinase (Wigler, et al.,1977, Cell 11:223), hypoxanthine-guanine phosphoribosyltransferase(Szybalska & Szybalski, 1962, Proc. Natl. Acad. Sci. USA 48:2026), andadenine phosphoribosyltransferase (Lowy, et al., 1980, Cell 22:817)genes can be employed in tk⁻, bgprt⁻ or aprt⁻ cells, respectively. Also,antimetabolite resistance can be used as the basis of selection for thefollowing genes: dhfr, which confers resistance to methotrexate (Wigler,et al., 1980, Natl. Acad. Sci. USA 77:3567; O'Hare, et al., 1981, Proc.Natl. Acad. Sci. USA 78:1527); gpt, which confers resistance tomycophenolic acid (Mulligan & Berg, 1981, Proc. Natl. Acad. Sci. USA78:2072); neo, which confers resistance to the aminoglycoside G-418(Colberre-Garapin, et al., 1981, J. Mol. Biol. 150:1); and hygro, whichconfers resistance to hygromycin (Santerre, et al., 1984, Gene 30:147).

[0060] Alternatively, any fusion protein may be readily purified byutilizing an antibody specific for the fusion protein being expressed.For example, a system described by Janknecht et al. allows for the readypurification of non-denatured fusion proteins expressed in human celllines (Janknecht, et al., 1991, Proc. Natl. Acad. Sci. USA 88:8972-8976). In this system, the gene of interest is subcloned into avaccinia recombination plasmid such that the gene's open reading frameis translationally fused to an amino-terminal tag consisting of sixhistidine residues. Extracts from cells infected with recombinantvaccinia virus are loaded onto Ni²⁺·nitriloacetic acid-agarose columnsand histidine-tagged proteins are selectively eluted withimidazole-containing buffers.

[0061] The bg gene products can also be expressed in transgenic animals.Animals of any species, including, but not limited to, mice, rats,rabbits, guinea pigs, pigs, micro-pigs, goats, and non-human primates,e.g., baboons, monkeys, and chimpanzees may be used to generate bgtransgenic animals.

[0062] Any technique known in the art may be used to introduce the bggene transgene into animals to produce the founder lines of transgenicanimals. Such techniques include, but are not limited to pronuclearmicroinjection (Hoppe, P. C. and Wagner, T. E., 1989, U.S. Pat. No.4,873,191); retrovirus mediated gene transfer into germ lines (Van derPutten et al., 1985, Proc. Natl. Acad. Sci., USA 82:6148-6152); genetargeting in embryonic stem cells (Thompson et al., 1989, Cell56:313-321); electroporation of embryos (Lo, 1983, Mol Cell. Biol.3:1803-1814); and sperm-mediated gene transfer (Lavitrano et al., 1989,Cell 57:717-723); etc. For a review of such techniques, see Gordon,1989, Transgenic Animals, Intl. Rev. Cytol. 115:171-229, which isincorporated by reference herein in its entirety.

[0063] The present invention provides for transgenic animals that carrythe bg transgene in all their cells, as well as animals which carry thetransgene in some, but not all their cells, i.e., mosaic animals. Thetransgene may be integrated as a single transgene or in concatamers,e.g., head-to-head tandems or head-to-tail tandems. The transgene mayalso be selectively introduced into and activated in a particular celltype by following, for example, the teaching of Lasko et al. (Lasko, M.et al., 1992, Proc. Natl. Acad. Sci. USA 89: 6232-6236). The regulatorysequences required for such a cell-type specific activation will dependupon the particular cell type of interest, and will be apparent to thoseof skill in the art. When it is desired that the bg gene transgene beintegrated into the chromosomal site of the endogenous bg gene, genetargeting is preferred. Briefly, when such a technique is to beutilized, vectors containing some nucleotide sequences homologous to theendogenous bg gene are designed for the purpose of integrating, viahomologous recombination with chromosomal sequences, into and disruptingthe function of the nucleotide sequence of the endogenous bg gene. Thetransgene may also be selectively introduced into a particular celltype, thus inactivating the endogenous bg gene in only that cell type,by following, for example, the teaching of Gu et al. (Gu, et al., 1994,Science 265: 103-106). The regulatory sequences required for such acell-type specific inactivation will depend upon the particular celltype of interest, and will be apparent to those of skill in the art.

[0064] Once transgenic animals have been generated, the expression ofthe recombinant bg gene may be assayed utilizing standard techniques.Initial-screening may be accomplished by Southern blot analysis or PCRtechniques to analyze animal tissues to assay whether integration of thetransgene has taken place. The level of mRNA expression of the transgenein the tissues of the transgenic animals may also be assessed usingtechniques which include but are not limited to Northern blot analysisof tissue samples obtained from the animal, in situ hybridizationanalysis, and RT-PCR. Samples of bg gene-expressing tissue, may also beevaluated immunocytochemically using antibodies specific for the bgtransgene product.

5.3. Antibodies to bg Gene Products

[0065] Described herein are methods for the production of antibodiescapable of specifically recognizing one or more bg gene product epitopesor epitopes of conserved variants or peptide fragments of the bg geneproducts.

[0066] Such antibodies may include, but are not limited to, polyclonalantibodies, monoclonal antibodies (mAbs), humanized or chimericantibodies, single chain antibodies, Fab fragments, F(ab′)₂ fragments,fragments produced by a Fab expression library, anti-idiotypic (anti-Id)antibodies, and epitope-binding fragments of any of the above. Suchantibodies may be used, for example, in the detection of a bg geneproduct in an biological sample and may, therefore, be utilized as partof a diagnostic or prognostic technique whereby patients may be testedfor abnormal levels of bg gene products, and/or for the presence ofabnormal forms of the such gene products. Such antibodies may also beutilized in conjunction with, for example, compound screening schemes,as described, below, in Section 5.4.2, for the evaluation of the effectof test compounds on bg gene product levels and/or activity.Additionally, such antibodies can be used in conjunction with the genetherapy techniques described, below, in Section 5.4.3, to, for example,evaluate the normal and/or engineered bg-expressing cells prior to theirintroduction into the patient.

[0067] Anti-bg gene product antibodies may additionally be used as amethod for the inhibition of abnormal bg gene product activity, in, forexample, instances in which such abnormal activity is due to anincreased level of bg gene product or to the presence of mutant,gain-of-function mutant bg gene products. Thus, such antibodies may,therefore, be utilized as part of methods for the treatment of disorderscaused by such abnormal bg gene product activity, including, forexample, disorders involving abnormal intracellular vesicledifferentiation and/or function.

[0068] For the production of antibodies against a bg gene product,various host animals may be immunized by injection with a bg geneproduct, or a portion thereof. Such host animals may include but are notlimited to rabbits, mice, and rats, to name but a few. Various adjuvantsmay be used to increase the immunological response, depending on thehost species, including but not limited to Freund's (complete andincomplete), mineral gels such as aluminum hydroxide, surface activesubstances such as lysolecithin, pluronic polyols, polyanions, peptides,oil emulsions, keyhole limpet hemocyanin, dinitrophenol, and potentiallyuseful human adjuvants such as BCG (bacille Calmette-Guerin) andCorynebacterium parvum.

[0069] Polyclonal antibodies are heterogeneous populations of antibodymolecules derived from the sera of animals immunized with an antigen,such as a bg gene product, or an antigenic functional derivativethereof. For the production of polyclonal antibodies, host animals suchas those described above, may be immunized by injection with bg geneproduct supplemented with adjuvants as also described above.

[0070] Monoclonal antibodies, which are homogeneous populations ofantibodies to a particular antigen, may be obtained by any techniquewhich provides for the production of antibody molecules by continuouscell lines in culture. These include, but are not limited to, thehybridoma technique of Kohler and Milstein, (1975, Nature 256:495-497;and U.S. Pat. No. 4,376,110), the human B-cell hybridoma technique(Kosbor et al., 1983, Immunology Today 4:72; Cole et al., 1983, Proc.Natl. Acad. Sci. USA 80:2026-2030), and the EBV-hybridoma technique(Cole et al., 1985, Monoclonal Antibodies And Cancer Therapy, Alan R.Liss, Inc., pp. 77-96). Such antibodies may be of any immunoglobulinclass including IgG, IgM, IgE, IgA, IgD and any subclass thereof. Thehybridoma producing the mAb of this invention may be cultivated in vitroor in vivo. Production of high titers of inAbs in vivo makes this thepresently preferred method of production.

[0071] In addition, techniques developed for the production of “chimericantibodies” (Morrison et al., 1984, Proc. Natl. Acad. Sci.,81:6851-6855; Neuberger et al., 1984, Nature, 312:604-608; Takeda etal., 1985, Nature, 314:452-454) by splicing the genes from a mouseantibody molecule of appropriate antigen specificity together with genesfrom a human antibody molecule of appropriate biological activity can beused. A chimeric antibody is a molecule in which different portions arederived from different animal species, such as those having a variableregion derived from a murine mAb and a human immunoglobulin constantregion.

[0072] Alternatively, techniques described for the production of singlechain antibodies (U.S. Pat. No. 4,946,778; Bird, 1988, Science242:423-426; Huston et al., 1988, Proc. Natl. Acad. Sci. USA85:5879-5883; and Ward et al., 1989, Nature 334:544-546) can be adaptedto produce single chain antibodies against bg gene products. Singlechain antibodies are formed by linking the heavy and light chainfragments of the Fv region via an amino acid bridge, resulting in asingle chain polypeptide.

[0073] Antibody fragments which recognize specific epitopes may begenerated by known techniques. For example, such fragments include butare not limited to: the F(ab′)₂ fragments which can be produced bypepsin digestion of the antibody molecule and the Fab fragments whichcan be generated by reducing the disulfide bridges of the F(ab′)₂fragments. Alternatively, Fab expression libraries may be constructed(Huse et al., 1989, Science, 246:1275-1281) to allow rapid and easyidentification of monoclonal Fab fragments with the desired specificity.

5.4. Uses of the bg Gene, Gene Products, and Antibodies

[0074] Described herein are various applications of the bg gene, the bggene product including peptide fragments thereof, and of antibodiesdirected against the bg gene product and peptide fragments thereof.

[0075] Such applications include, for example, prognostic and diagnosticevaluation of disorders involving abnormal intracellular vesicles,including, for example, abnormal lysosomes, melanosomes, platelet densegranules and cytolytic granules, including, but not limited toChediak-Higashi syndrome (CHS), and methods for the identification ofsubjects with a predisposition to such disorders and the identificationof individuals carrying mutant bg alleles.

[0076] Such methods may, for example, utilize reagents such as the bggene nucleotide sequences described in Sections 5.1, and antibodiesdirected against bg gene products, including peptide fragments thereof,as described, above, in Section 5.3. Specifically, such reagents may beused, for example, for: (1) nucleic acid-based techniques for thedetection of the presence of bg gene mutations, or the detection ofeither over- or under-expression of bg gene mRNA relative levels knownto be found in the normal state; and (2) peptide-based techniques forthe detection of mutant BG proteins or either an over- or anunder-abundance of BG relative levels known to be found in the normalstate.

[0077] The methods described herein may be performed, for example, byutilizing pre-packaged diagnostic kits comprising at least one specificbg gene nucleic acid or anti-bg gene antibody reagent described herein,which may be conveniently used, e.g., in clinical settings, to diagnosepatients exhibiting intracellular vesicle disorder abnormalities.

[0078] Nucleic acid-based detection techniques are described, below, inSection 5.4.1. Peptide detection techniques are described, below, inSection 5.4.2.

[0079] Additionally, such applications include methods for the treatmentof disorders involving abnormal intracellular vesicles, including CHS,as described, below, in Section 5.4.4, and for the identification ofcompounds which modulate the expression of the bg gene and/or theactivity of the bg gene product, as described below, in Section 5.4.3.Such compounds can include, for example, other cellular products whichare involved in normal differentiation and/or function of intracellularvesicles.

5.4.1. Detection of bg Gene Nucleic Acid Molecules

[0080] Mutations within the bg gene can be detected by utilizing anumber of techniques. Nucleic acid from any nucleated cell can be usedas the starting point for such assay techniques, and may be isolatedaccording to standard nucleic acid preparation procedures which are wellknown to those of skill in the art.

[0081] DNA may be used in hybridization or amplification assays ofbiological samples to detect abnormalities involving bg gene structure,including point mutations, insertions, deletions and chromosomalrearrangements. Such assays may include, but are not limited to,Southern analyses, single stranded conformational polymorphism analyses(SSCP), and PCR analyses.

[0082] Such diagnostic methods for the detection of bg gene-specificmutations can involve for example, contacting and incubating nucleicacids including recombinant DNA molecules, cloned genes or degeneratevariants thereof, obtained from a sample, e.g., derived from a patientsample or other appropriate cellular source, with one or more labelednucleic acid reagents including recombinant DNA molecules, cloned genesor degenerate variants thereof, as described in Section 5.1, underconditions favorable for the specific annealing of these reagents totheir complementary sequences within the bg gene. Preferably, thelengths of these nucleic acid reagents are at least 15 to 30nucleotides. After incubation, all non-annealed nucleic acids areremoved from the nucleic acid:bg molecule hybrid. The presence ofnucleic acids which have hybridized, if any such molecules exist, isthen detected. Using such a detection scheme, the nucleic acid from thecell type or tissue of interest can be immobilized, for example, to asolid support such as a membrane, or a plastic surface such as that on amicrotiter plate or polystyrene beads. In this case, after incubation,non-annealed, labeled nucleic acid reagents of the type described inSection 5.1 are easily removed. Detection of the remaining, annealed,labeled bg nucleic acid reagents is accomplished using standardtechniques well-known to those in the art. The bg gene sequences towhich the nucleic acid reagents have annealed can be compared to theannealing pattern expected from a normal bg gene sequence in order todetermine whether a bg gene mutation is present.

[0083] Alternative diagnostic methods for the detection of bg genespecific nucleic acid molecules, in patient samples or other appropriatecell sources, may involve their amplification, e.g., by PCR (theexperimental embodiment set forth in Mullis, K. B., 1987, U.S. Pat. No.4,683,202), followed by the detection of the amplified molecules usingtechniques well known to those of skill in the art. The resultingamplified sequences can be compared to those which would be expected ifthe nucleic acid being-amplified contained only normal copies of the bggene in order to determine whether a bg gene mutation exists.

[0084] Additionally, well-known genotyping techniques can be performedto identify individuals carrying bg gene mutations. Such techniquesinclude, for example, the use of restriction fragment lengthpolymorphisms (RFLPs), which involve sequence variations in one of therecognition sites for the specific restriction enzyme used.

[0085] Additionally, improved methods for analyzing DNA polymorphismswhich can be utilized for the identification of bg gene mutations havebeen described which capitalize on the presence of variable numbers ofshort, tandemly repeated DNA sequences between the restriction enzymesites. For example, Weber (U.S. Pat. No. 5,075,217, which isincorporated herein by reference in its entirety) describes a DNA markerbased on length polymorphisms in blocks of (dC-dA)n-(dG-dT)n shorttandem repeats. The average separation of (dC-dA)n-(dG-dT)n blocks isestimated to be 30,000-60,000 bp. Markers which are so closely spacedexhibit a high frequency co-inheritance, and are extremely useful in theidentification of genetic mutations, such as, for example, mutationswithin the bg gene, and the diagnosis of diseases and disorders relatedto bg mutations.

[0086] Also, Caskey et al. (U.S. Pat. No. 5,364,759, which isincorporated herein by reference in its entirety) describe a DNAprofiling assay for detecting short tri and tetra nucleotide repeatsequences. The process includes extracting the DNA of interest, such asthe bg gene, amplifying the extracted DNA, and labelling the repeatsequences to form a genotypic map of the individual's DNA.

[0087] The level and/or type of bg gene expression can also be assayed.For example, RNA from a cell type or tissue known, or suspected, toexpress the bg gene, may be isolated and tested utilizing hybridizationor PCR techniques such as are described, above. The isolated cells canbe derived from cell culture or from a patient. The analysis of cellstaken from culture may be a necessary step in the assessment of cells tobe used as part of a cell-based gene therapy technique or,alternatively, to test the effect of compounds on the expression of thebg gene. Such analyses may reveal both quantitative and qualitativeaspects of the expression pattern of the bg gene, including activationor inactivation of bg gene expression, as well as reveal the presence orabsence of alternatively spliced forms of bg gene transcripts.

[0088] In one embodiment of such a detection scheme, a cDNA molecule issynthesized from an RNA molecule of interest (e.g., by reversetranscription of the RNA molecule into cDNA). A sequence within the cDNAis then used as the template for a nucleic acid amplification reaction,such as a PCR amplification reaction, or the like. The nucleic acidreagents used as synthesis initiation reagents (e.g., primers) in thereverse transcription and nucleic acid amplification steps of thismethod are chosen from among the bg gene nucleic acid reagents describedin Section 5.1. The preferred lengths of such nucleic acid reagents areat least 9-30 nucleotides. For detection of the amplified product, thenucleic acid amplification may be performed using radioactively ornon-radioactively labeled nucleotides. Alternatively, enough amplifiedproduct may be made such that the product may be visualized by standardethidium bromide staining or by utilizing any other suitable nucleicacid staining method.

[0089] Additionally, it is possible to perform such bg gene expressionassays “in situ”, i.e., directly upon tissue sections (fixed and/orfrozen) of patient tissue obtained from biopsies or resections, suchthat no nucleic acid purification is necessary. Nucleic acid reagentssuch as those described in Section 5.1 may be used as probes and/orprimers for such in situ procedures (see, for example, Nuovo, G. J.,1992, “PCR In Situ Hybridization: Protocols And Applications”, RavenPress, NY).

[0090] Alternatively, if a sufficient quantity of the appropriate cellscan be obtained, standard Northern analysis can be performed todetermine the level of mRNA expression of the bg gene.

5.4.2. Detection of bg Gene Products

[0091] Antibodies directed against wild type or mutant bg gene productsor conserved variants or peptide fragments thereof, which are discussed,above, in Section 5.3, may also be used as intracellular vesicle,including, but not limited to CHS, disorder diagnostics and prognostics,as described herein. Such diagnostic methods, may be used to detectabnormalities in the level of bg gene expression, or abnormalities inthe structure and/or temporal, tissue, cellular, or subcellular locationof bg gene product. Further, such assays can be utilized to detect thepresence or absence of bg gene products encoded by alternatively splicedbg gene transcripts. Given the intracellular vesicles affected by bgmutations, it is possible that the bg gene product is an intracellulargene product. The antibodies and immunoassay methods described below,therefore, have important in vitro applications in assessing theefficacy of treatments for such disorders. Antibodies, or fragments ofantibodies, such as those described below, may be used to screenpotentially therapeutic compounds in vitro to determine their effects onbg gene expression and bg peptide production. The compounds which havebeneficial effects on intracellular vesicle disorders, such as forexample, CHS, can be identified, and a therapeutically effective dosedetermined.

[0092] In vitro immunoassays may also be used, for example, to assessthe efficacy of cell-based gene therapy for intracellular vesicledisorder, including, for example, CHS. Antibodies directed against bgpeptides may be used in vitro to determine the level of bg geneexpression achieved in cells genetically engineered to produce bgpeptides. Given that the bg gene product may represent an intracellulargene product, such an assessment is, preferably, done using cell lysatesor extracts. Such analysis will allow for a determination of the numberof transformed cells necessary to. achieve therapeutic efficacy in vivo,as well as optimization of the gene replacement protocol.

[0093] The tissue or cell type to be analyzed will generally includethose which are known, or suspected, to express the bg gene. The proteinisolation methods employed herein may, for example, be such as thosedescribed in Harlow and Lane (Harlow, E. and Lane, D., 1988,“Antibodies: A Laboratory Manual”, Cold Spring Harbor Laboratory Press,Cold Spring Harbor, N.Y.), which is incorporated herein by reference inits entirety. The isolated cells can be derived from cell culture orfrom a patient. The analysis of cell taken from culture may be anecessary step in the assessment of cells to be used as part of acell-based gene therapy technique or, alternatively, to test the effectof compounds on the expression of the bg gene.

[0094] Preferred diagnostic methods for the detection of bg geneproducts or conserved variants or peptide fragments thereof, mayinvolve, for example, immunoassays wherein the bg gene products orconserved variants or peptide fragments are detected by theirinteraction with an anti-bg gene product-specific antibody.

[0095] For example, antibodies, or fragments of antibodies, such asthose described, above, in Section 5.3, useful in the present inventionmay be used to quantitatively or qualitatively detect the presence of bggene products or conserved variants or peptide fragments thereof. Thiscan be accomplished, for example, by immunofluorescence techniquesemploying a fluorescently labeled antibody (see below, this Section)coupled with light microscopic, flow cytometric, or fluorimetricdetection. Such techniques are especially preferred if such bg geneproducts are expressed on the cell surface.

[0096] The antibodies (or fragments thereof) useful in the presentinvention may, additionally, be employed histologically, as inimmunofluorescence or immunoelectron microscopy, for in situ detectionof bg gene products or conserved variants or peptide fragments thereof.In situ detection may be accomplished by removing a histologicalspecimen from a patient, and applying thereto a labeled antibody of thepresent invention. The antibody (or fragment) is preferably applied byoverlaying the labeled antibody (or fragment) onto a biological sample.Through the use of such a procedure, it is possible to determine notonly the presence of the bg gene product, or conserved variants orpeptide fragments, but also its distribution in the examined tissue.Using the present invention, those of ordinary skill will readilyperceive that any of a wide variety of histological methods (such asstaining procedures) can be modified in order to achieve such in situdetection.

[0097] Immunoassays for bg gene products or conserved variants orpeptide fragments thereof will typically comprise incubating a sample,such as a biological fluid, a tissue extract, freshly harvested cells,or lysates of cells which have been incubated in cell culture, in thepresence of a detectably labeled antibody capable of identifying bg geneproducts or conserved variants or peptide fragments thereof, anddetecting the bound antibody by any of a number of techniques well-knownin the art.

[0098] The biological sample may be brought in contact with andimmobilized onto a solid phase support or carrier such asnitrocellulose, or other solid support which is capable of immobilizingcells, cell particles or soluble proteins. The support may then bewashed with suitable buffers followed by treatment with the detectablylabeled bg gene specific antibody. The solid phase support may then bewashed with the buffer a second time to remove unbound antibody. Theamount of bound label on solid support may then be detected byconventional means.

[0099] By “solid phase support or carrier” is intended any supportcapable of binding an antigen or an antibody. Well-known supports orcarriers include glass, polystyrene, polypropylene, polyethylene,dextran, nylon, amylases, natural and modified celluloses,polyacrylamides, gabbros, and magnetite. The nature of the carrier canbe either soluble to some extent or insoluble for the purposes of thepresent invention. The support material may have virtually any possiblestructural configuration so long as the coupled molecule is capable ofbinding to an antigen or antibody. Thus, the support configuration maybe spherical, as in a bead, or cylindrical, as in the inside surface ofa test tube, or the external surface of a rod. Alternatively, thesurface may be flat such as a sheet, test strip, etc. Preferred supportsinclude polystyrene beads. Those skilled in the art will know many othersuitable carriers for binding antibody or antigen, or will be able toascertain the same by use of routine experimentation.

[0100] The binding activity of a given lot of anti-bg gene productantibody may be determined according to well known methods. Thoseskilled in the art will be able to determine operative and optimal assayconditions for each determination by employing routine experimentation.

[0101] One of the ways in which the bg gene peptide-specific antibodycan be detectably labeled is by linking the same to an enzyme and use inan enzyme immunoassay (EIA) (Voller, A., “The Enzyme LinkedImmunosorbent Assay (ELISA)”, 1978, Diagnostic Horizons 2:1-7,Microbiological Associates Quarterly Publication, Walkersville, Md.);Voller, A. et al., 1978, J. Clin. Pathol. 31:507-520; Butler, J. E.,1981, Meth. Enzymol. 73:482-523; Maggio, E. (ed.), 1980, EnzymeImmunoassay, CRC Press, Boca Raton, Fla.,; Ishikawa, E. et al., (eds.),1981, Enzyme Immunoassay, Kgaku Shoin, Tokyo). The enzyme which is boundto the antibody will react with an appropriate substrate, preferably achromogenic substrate, in such a manner as to produce a chemical moietywhich can be detected, for example, by spectrophotometric, fluorimetricor by visual means. Enzymes which can be used to detectably label theantibody include, but are not limited to, malate dehydrogenase,staphylococcal nuclease, delta-5-steroid isomerase, yeast alcoholdehydrogenase, alpha-glycerophosphate, dehydrogenase, triose phosphateisomerase, horseradish peroxidase, alkaline phosphatase, asparaginase,glucose oxidase, beta-galactosidase, ribonuclease, urease, catalase,glucose-6-phosphate dehydrogenase, glucoamylase andacetylcholinesterase. The detection can be accomplished by colorimetricmethods which employ a chromogenic substrate for the enzyme. Detectionmay also be accomplished by visual comparison of the extent of enzymaticreaction of a substrate in comparison with similarly prepared standards.

[0102] Detection may also be accomplished using any of a variety ofother immunoassays. For example, by radioactively labeling theantibodies or antibody fragments, it is possible to detect bg genepeptides through the use of a radioimmunoassay (RIA) (see, for example,Weintraub, B., Principles of Radioimmunoassays, Seventh Training Courseon Radioligand Assay Techniques, The Endocrine Society, March, 1986,which is incorporated by reference herein). The radioactive isotope canbe detected by such means as the use of a gamma counter or ascintillation counter or by autoradiography.

[0103] It is also possible to label the antibody with a fluorescentcompound. When the fluorescently labeled antibody is exposed to light ofthe proper wave length, its presence can then be detected due tofluorescence. Among the most commonly used fluorescent labelingcompounds are fluorescein isothiocyanate, rhodamine, phycoerythrin,phycocyanin, allophycocyanin, o-phthaldehyde and fluorescamine.

[0104] The antibody can also be detectably labeled using fluorescenceemitting metals such as ¹⁵²Eu, or others of the lanthanide series. Thesemetals can be attached to the antibody using such metal chelating groupsas diethylenetriaminepentacetic acid (DTPA) orethylenediaminetetraacetic acid (EDTA).

[0105] The antibody also can be detectably labeled by coupling it to achemiluminescent compound. The presence of the chemiluminescent-taggedantibody is then determined by detecting the presence of luminescencethat arises during the course of a chemical reaction. Examples ofparticularly useful chemiluminescent labeling compounds are luminol,isoluminol, theromatic acridinium ester, imidazole, acridinium salt andoxalate ester.

[0106] Likewise, a bioluminescent compound may be used to label theantibody of the present invention. Bioluminescence is a type ofchemiluminescence found in biological systems in, which a catalyticprotein increases the efficiency of the chemiluminescent reaction. Thepresence of a bioluminescent protein is determined by detecting thepresence of luminescence. Important bioluminescent compounds forpurposes of labeling are luciferin, luciferase and aequorin.

5.4.3. Screening Assays for Compounds that Modulate bg Activity

[0107] The following assays are designed to identify compounds that bindto bg gene products, bind to other intracellular proteins that interactwith a bg gene product, to compounds that interfere with the interactionof the bg gene product with other intracellular proteins and tocompounds which modulate the activity of bg gene (i.e., modulate thelevel of is gene expression and/or modulate the level of bg gene productactivity). Assays may additionally be utilized which identify compoundswhich bind to bg gene regulatory sequences (e.g., promoter sequences).See e.g., Platt, K. A., 1994, J. Biol. Chem. 269:28558-28562, which isincorporated herein by reference in its entirety, which may modulate thelevel of bg gene expression. Compounds may include, but are not limitedto, small organic molecules which are able to cross the blood-brainbarrier, gain entry into an appropriate cell and affect expression ofthe bg gene or some other gene involved in the pathway or pathwaysregulating intracellular vesicle differentiation and/or function, orother intracellular proteins. Methods for the identification of suchintracellular proteins are described, below, in Section 5.4.3.1. Suchintracellular proteins may be involved in the differentiation and/orfunction of intracellular vesicles, including, but not limited to,lysosomes, melanosomes, platelet dense granules and cytolytic granules.Further, among these compounds are compounds which affect the level ofbg gene expression and/or bg gene product activity and which can be usedin the therapeutic treatment of disorders involving abnormalintracellular vesicles, including, but not limited to, abnormallysosomes, melanosomes, platelet dense granules and cytolytic granules,including CHS, as described, below, in Section 5.4.4.

[0108] Compounds may include, but are not limited to, peptides such as,for example, soluble peptides, including but not limited to, Ig-tailedfusion peptides, and members of random peptide libraries; (see, e.g.,Lam, K. S. et al., 1991, Nature 354:82-84; Houghten, R. et al., 1991,Nature 354:84-86), and combinatorial chemistry-derived molecular librarymade of D- and/or L-configuration amino acids, phosphopeptides(including, but not limited to members of random or partiallydegenerate, directed phosphopeptide libraries; see, e.g., Songyang, Z.et al., 1993, Cell 72:767-778), antibodies (including, but not limitedto, polyclonal, monoclonal, humanized, anti-idiotypic, chimeric orsingle chain antibodies, and FAb, F(ab′)₂ and FAb expression libraryfragments, and epitope-binding fragments thereof), and small organic orinorganic molecules.

[0109] Compounds identified via assays such as those described hereinmay be useful, for example, in elaborating the biological function ofthe bg gene product, and for ameliorating intracellular vesicledisorders such as, for example, CHS. Assays for testing theeffectiveness of compounds, identified by, for example, techniques suchas those described in Section 5.4.3.1-5.4.3.3, are discussed, below, inSection 5.4.3.4.

5.4.3.1. In vitro Screening Assays for Compounds that Bind to the BgGene Products

[0110] In vitro systems may be designed to identify compounds capable ofbinding the bg gene products of the invention. Compounds identified maybe useful, for example, in modulating the activity of wild type and/ormutant bg gene products, may be useful in elaborating the biologicalfunction of the bg gene product, may be utilized in screens foridentifying compounds that disrupt normal bg gene product interactions,or may in themselves disrupt such interactions.

[0111] The principle of the assays used to identify compounds that bindto the bg gene product involves preparing a reaction mixture of the bggene product and the test compound under conditions and for a timesufficient to allow the two components to interact and bind, thusforming a complex which can be removed and/or detected in the reactionmixture. These assays can be conducted in a variety of ways. Forexample, one method to conduct such an assay would involve anchoring isgene product or the test substance onto a solid phase and detecting bggene product/test compound complexes anchored on the solid phase at theend of the reaction. In one embodiment of such a method, the bg geneproduct may be anchored onto a solid surface, and the test compound,which is not anchored, may be labeled, either directly or indirectly.

[0112] In practice, microtiter plates may conveniently be utilized asthe solid phase. The anchored component may be immobilized bynon-covalent or covalent attachments. Non-covalent attachment may beaccomplished by simply coating the solid surface with a solution of theprotein and drying. Alternatively, an immobilized antibody, preferably amonoclonal antibody, specific for the protein to be immobilized may beused to anchor the protein to the solid surface. The surfaces may beprepared in advance and stored.

[0113] In order to conduct the assay, the nonimmobilized component isadded to the coated surface containing the anchored component. After thereaction is complete, unreacted components are removed (e.g., bywashing) under conditions such that any complexes formed will remainimmobilized on the solid surface. The detection of complexes anchored onthe solid surface can be accomplished in a number of ways. Where thepreviously nonimmobilized component is pre-labeled, the detection oflabel immobilized on the surface indicates that complexes were formed.Where the previously nonimmobilized component is not pre-labeled, anindirect label can be used to detect complexes anchored on the surface;e.g., using a labeled antibody specific for the previouslynonimmobilized component (the antibody, in turn, may be directly labeledor indirectly labeled with a labeled anti-Ig antibody).

[0114] Alternatively, a reaction can be conducted in a liquid phase, thereaction products separated from unreacted components, and complexesdetected; e.g., using an immobilized antibody specific for bg geneproduct or the test compound to anchor any complexes formed in solution,and a labeled antibody specific for the other component of the possiblecomplex to detect anchored complexes.

5.4.3.2. Assays for Intracellular Proteins that Interact with the bgGene Product

[0115] Any method suitable for detecting protein-protein interactionsmay be employed for identifying bg protein-intracellular proteininteractions.

[0116] Among the traditional methods which may be employed areco-immunoprecipitation, crosslinking and co-purification throughgradients or chromatographic columns. Utilizing procedures such as theseallows for the identification of intracellular proteins which interactwith bg gene products. Once isolated, such an intracellular protein canbe identified and can, in turn, be used, in conjunction with standardtechniques, to identify proteins it interacts with. For example, atleast a portion of the amino acid sequence of the intracellular proteinwhich interacts with the bg gene product can be ascertained usingtechniques well known to those of skill in the art, such as via theEdman degradation technique (see, e.g., Creighton, 1983, “Proteins:Structures and Molecular Principles”, W.H. Freeman & Co., N.Y.,pp.34-49). The amino acid sequence obtained may be used as a guide forthe generation of oligonucleotide mixtures that can be used to screenfor gene sequences encoding such intracellular proteins. Screening madebe accomplished, for example, by standard hybridization or PCRtechniques. Techniques for the generation of oligonucleotide mixturesand the screening are well-known. (See, e.g., Ausubel, supra., and PCRProtocols: A Guide to Methods and Applications, 1990, Innis, M. et al.,eds. Academic Press, Inc., New York).

[0117] Additionally, methods may be employed which result in thesimultaneous identification of genes which encode the intracellularprotein interacting with the bg protein. These methods include, forexample, probing expression libraries with labeled bg protein, using bgprotein in a manner similar to the well known technique of antibodyprobing of λgt11 libraries.

[0118] One method which detects protein interactions in vivo, thetwo-hybrid system, is described in detail for illustration only and notby way of limitation. One version of this system has been described(Chien et al., 1991, Proc. Natl. Acad. Sci. USA, 88:9578-9582) and iscommercially available from Clontech (Palo Alto, Calif.).

[0119] Briefly, utilizing such a system, plasmids are constructed thatencode two hybrid proteins: one consists of the DNA-binding domain of atranscription activator protein fused to the bg gene product and theother consists of the transcription activator protein's activationdomain fused to an unknown protein that is encoded by a cDNA which hasbeen recombined into this plasmid as part of a cDNA library. TheDNA-binding domain fusion plasmid and the cDNA library are transformedinto a strain of the yeast Saccharomyces cerevisiae that contains areporter gene (e.g., HBS or lacZ) whose regulatory region contains thetranscription activator's binding site. Either hybrid protein alonecannot activate transcription of the reporter gene: the DNA-bindingdomain hybrid cannot because it does not provide activation function andthe activation domain hybrid cannot because it cannot localize to theactivator's binding sites. Interaction of the two hybrid proteinsreconstitutes the functional activator protein and results in expressionof the reporter gene, which is detected by an assay for the reportergene product.

[0120] The two-hybrid system or related methodology may be used toscreen-activation domain libraries for proteins that interact with the“bait” gene product. By way of example, and not by way of limitation, bggene products may be used as the bait gene product. Total genomic orcDNA sequences are fused to the DNA encoding an activation domain. Thislibrary and a plasmid encoding a hybrid of a bait bg gene product fusedto the DNA-binding domain are cotransformed into a yeast reporterstrain, and the resulting transformants are screened for those thatexpress the reporter gene. For example, and not by way of limitation, abait bg gene sequence, such as the bg open reading frame sequence inFIG. 4, can be cloned into a vector such that it is translationallyfused to the DNA encoding the DNA-binding domain of the GAL4 protein.These colonies are purified and the library plasmids responsible forreporter gene expression are isolated. DNA sequencing is then used toidentify the proteins encoded by the library plasmids.

[0121] A cDNA library of the cell line from which proteins that interactwith bait bg gene product are to be detected can be made using methodsroutinely practiced in the art. According to the particular systemdescribed herein, for example, the cDNA fragments can be inserted into avector such that they are translationally fused to the transcriptionalactivation domain of GAL4. This library can be co-transformed along withthe bait bg gene-GAL4 fusion plasmid into a yeast strain which containsa lacZ gene driven by a promoter which contains GAL4 activationsequence. A cDNA encoded protein, fused to GAL4 transcriptionalactivation domain, that interacts with bait bg gene product willreconstitute an active GAL4 protein and thereby drive expression of theHIS3 gene. Colonies which express HIS3 can be detected by their growthon petri dishes containing semi-solid agar based media lackinghistidine. The cDNA can then be purified from these strains, and used toproduce and isolate the bait bg gene-interacting protein usingtechniques routinely practiced in the art.

5.4.3.3. Assays for Compounds that Interfere with bg GeneProduct/Intracellular Macromolecule Interaction

[0122] The bg gene products of the invention may, in vivo, interact withone or more intracellular macromolecules, such as proteins. Suchmacromolecules may include, but are not limited to, nucleic acidmolecules and those proteins identified via methods such as thosedescribed, above, in Section 5.4.3.2. For purposes of this discussion,such intracellular macromolecules are referred to herein as “bindingpartners”. Compounds that disrupt bg binding in this way may be usefulin regulating the activity of the bg gene product, especially mutant bggene products. Such compounds may include, but are not limited tomolecules such as peptides, and the like, as described, for example, inSection 5.4.3.1. above, which would be capable of gaining access to theintracellular bg gene product.

[0123] The basic principle of the assay systems used to identifycompounds that interfere with the interaction between the bg geneproduct and its intracellular binding partner or partners involvespreparing a reaction mixture containing the bg gene product, and thebinding partner under conditions and for a time sufficient to allow thetwo to interact and bind, thus forming a complex. In order to test acompound for inhibitory activity, the reaction mixture is prepared inthe presence and absence of the test compound. The test compound may beinitially included in the reaction mixture, or may be added at a timesubsequent to the addition of bg gene product and its intracellularbinding partner. Control reaction mixtures are incubated without thetest compound or with a placebo. The formation of any complexes betweenthe bg gene protein and the intracellular binding partner is thendetected. The formation of a complex in the control reaction, but not inthe reaction mixture containing the test compound, indicates that thecompound interferes with the interaction of the bg gene protein and theinteractive binding partner. Additionally, complex formation withinreaction mixtures containing the test compound and normal bg geneprotein may also be compared to complex formation within reactionmixtures containing the test compound and a mutant bg gene protein. Thiscomparison may be important in those cases wherein it is desirable toidentify compounds that disrupt interactions of mutant but not normal bggene proteins.

[0124] The assay for compounds that interfere with the interaction ofthe bg gene products and binding partners can be conducted in aheterogeneous or homogeneous format. Heterogeneous assays involveanchoring either the bg gene product or the binding partner onto a solidphase and detecting complexes anchored on the solid phase at the end ofthe reaction. In homogeneous assays, the entire reaction is carried outin a liquid phase. In either approach, the order of addition ofreactants can be varied to obtain different information aboutthe-compounds being tested. For example, test compounds that interferewith the interaction between the bg gene products and the bindingpartners, e.g., by competition, can be identified by conducting thereaction in the presence of the test substance; i.e., by adding the testsubstance to the reaction mixture prior to or simultaneously with the bggene protein and interactive intracellular binding partner.Alternatively, test compounds that disrupt preformed complexes, e.g.compounds with higher binding constants that displace one of thecomponents from the complex, can be tested by adding the test compoundto the reaction mixture after complexes have been formed. The variousformats are described briefly below.

[0125] In a heterogeneous assay system, either the bg gene product orthe interactive intracellular binding partner, is anchored onto a solidsurface, while the non-anchored species is labeled, either directly orindirectly. In practice, microtiter plates are conveniently utilized.The anchored species may be immobilized by non-covalent or covalentattachments. Non-covalent attachment may be accomplished simply bycoating the solid surface with a solution of the bg gene product orbinding partner and drying. Alternatively, an immobilized antibodyspecific for the species to be anchored may be used to anchor thespecies to the solid surface. The surfaces may be prepared in advanceand stored.

[0126] In order to conduct the assay, the partner of the immobilizedspecies is exposed to the coated surface with or without the testcompound. After the reaction is complete, unreacted components areremoved (e.g., by washing) and any complexes formed will remainimmobilized on the solid surface. The detection of complexes anchored onthe solid surface can be accomplished in a number of ways. Where thenon-immobilized species is pre-labeled, the detection of labelimmobilized on the surface indicates that complexes were formed. Wherethe non-immobilized species is not pre-labeled, an indirect label can beused to detect complexes anchored on the surface; e.g., using a labeledantibody specific for the initially non-immobilized species (theantibody, in turn, may be directly labeled or indirectly labeled with alabeled anti-Ig antibody). Depending upon the order of addition ofreaction components, test compounds which inhibit complex formation orwhich disrupt preformed complexes can be detected.

[0127] Alternatively, the reaction can be conducted in a liquid phase inthe presence or absence of the test compound, the reaction productsseparated from unreacted components, and complexes detected; e.g., usingan immobilized antibody specific for one of the binding components toanchor any complexes formed in solution, and a labeled antibody specificfor the other partner to detect anchored complexes. Again, dependingupon the order of addition of reactants to the liquid phase, testcompounds which inhibit complex or which disrupt preformed complexes canbe identified.

[0128] In an alternate embodiment of the invention, a homogeneous assaycan be used. In this approach, a preformed complex of the bg geneprotein and the interactive intracellular binding partner is prepared inwhich either the bg gene product or its binding partners is labeled, butthe signal generated by the label is quenched due to complex formation(see, e.g., U.S. Pat. No. 4,109,496 by Rubenstein which utilizes thisapproach for immunoassays). The addition of a test substance thatcompetes with and displaces one of the species from the preformedcomplex will result in the generation of a signal above background. Inthis way, test substances which disrupt bg gene protein/intracellularbinding partner interaction can be identified.

[0129] In a particular embodiment, the bg gene product can be preparedfor immobilization using recombinant DNA techniques described in Section5.2. above. For example, the bg coding region can be fused to aglutathione-S-transferase (GST) gene using a fusion vector, such aspGEX-5X-1, in such a manner that its binding activity is maintained inthe resulting fusion protein. The interactive intracellular bindingpartner can be purified and used to raise a monoclonal antibody, usingmethods routinely practiced in the art and described above, in Section5.3. This antibody can be labeled with the radioactive isotope ¹²⁵I, forexample, by methods routinely practiced in the art. In a heterogeneousassay, e.g., the GST-bg fusion protein can be anchored toglutathione-agarose beads. The interactive intracellular binding partnercan then be added in the presence or absence of the test compound in amanner that allows interaction and binding to occur. At the end of thereaction period, unbound material can be washed away, and the labeledmonoclonal antibody can be added to the system and allowed to bind tothe complexed components. The interaction between the bg gene proteinand the interactive intracellular binding partner can be detected bymeasuring the amount of radioactivity that remains associated with theglutathione-agarose beads. A successful inhibition of the interaction bythe test compound will result in a decrease in measured radioactivity.

[0130] Alternatively, the GST-bg gene fusion protein and the interactiveintracellular binding partner can be mixed together in liquid in theabsence of the solid glutathione-agarose beads. The test compound can beadded either during or after the species are allowed to interact. Thismixture can then be added to the glutathione-agarose beads and unboundmaterial is washed away. Again the extent of inhibition of the is geneproduct/binding partner interaction can be detected by adding thelabeled antibody and measuring the radioactivity associated with thebeads.

[0131] In another embodiment of the invention, these same techniques canbe employed using peptide fragments that correspond to the bindingdomains of the bg protein and/or the interactive intracellular orbinding partner (in cases where the binding partner is a protein), inplace of one or both of the full length proteins. Any number of methodsroutinely practiced in the art can be used to identify and isolate thebinding sites. These methods include, but are not limited to,mutagenesis of the gene encoding one of the proteins and screening fordisruption of binding in a co-immunoprecipitation assay. Compensatingmutations in the gene encoding the second species in the complex canthen be selected. Sequence analysis of the genes encoding the respectiveproteins will reveal the mutations that correspond to the region of theprotein involved in interactive binding. Alternatively, one protein canbe anchored to a solid surface using methods described in this Sectionabove, and allowed to interact with and bind to its labeled bindingpartner, which has been treated with a proteolytic enzyme, such astrypsin. After washing, a short, labeled peptide comprising the bindingdomain may remain associated with the solid material, which can beisolated and identified by amino acid sequencing. Also, once the genecoding for the intracellular binding partner is obtained, short genesegments can be engineered to express peptide fragments of the protein,which can then be tested for binding activity and purified orsynthesized.

[0132] For-example, and not by way of limitation, a bg gene product canbe anchored to a solid material as described, above, in this Section bymaking a GST-bg fusion protein and allowing it to bind to glutathioneagarose beads. The interactive intracellular binding partner can belabeled with radioactive isotope, such as ³⁵S, and cleaved with aproteolytic enzyme such as trypsin. Cleavage products can then be addedto the anchored GST-bg fusion protein and allowed to bind. After washingaway unbound peptides, labeled bound material, representing theintracellular binding partner binding domain, can be eluted, purified,and analyzed for amino acid sequence by well-known methods. Peptides soidentified can be produced synthetically or fused to appropriatefacilitative proteins using recombinant DNA technology.

5.4.3.4. Assays for Identification of Compounds that AmeliorateIntracellular Vesicle Disorders

[0133] Compounds, including but not limited to binding compoundsidentified via assay techniques such as those described, above, inSections 5.4.3.1-5.4.3.3, can be tested for the ability to ameliorateintracellular vesicle disorder symptoms, including symptoms associatedwith CHS. It should be noted that although bg gene products may beintracellular molecules which are not secreted and have no transmembranecomponent, the assays described herein can identify compounds whichaffect bg gene activity by either affecting bg gene expression or byaffecting the level of bg gene product activity. For example, compoundsmay be identified which are involved in another step in the pathway inwhich the bg gene and/or bg gene product is involved and, by affectingthis same pathway may modulate the affect of bg on the development ofintracellular vesicle disorders. Such compounds can be used as part of atherapeutic method for the treatment of intracellular vesicle disorders,including, for example, CHS.

[0134] Described below are cell-based and animal model-based assays forthe identification of compounds exhibiting such an ability tointracellular vesicle disorder symptoms.

[0135] First, cell-based systems can be used to identify compounds whichmay act to ameliorate intracellular vesicle disorder symptoms. Such cellsystems can include, for example, recombinant or non-recombinant cells,such as cell lines, which express the bg gene. Further, such cellsystems can include, for example, recombinant or non-recombinant cell,such as cell lines, which express mutant forms of the bg gene and/orwhich exhibit elements of the bg phenotype. For example, bg fibroblastcells, Aleutian mink cells or human Chediak-Higashi cells, as described,below, in Sections 7 and 8, can be used.

[0136] In utilizing such cell systems, cells may be exposed to acompound, suspected of exhibiting an ability to ameliorate intracellularvesicle disorder symptoms, at a sufficient concentration and for a timesufficient to elicit such an amelioration in the exposed cells. Afterexposure, the cells can be assayed to measure alterations in theexpression of the bg gene, e.g., by assaying cell lysates for bg mRNAtranscripts (e.g., by Northern analysis) or for bg protein expressed inthe cell; compounds which increase expression of the bg gene are goodcandidates as therapeutics. Alternatively, the cells are examined todetermine whether one or more aspects of the bg cellular phenotype hasbeen altered to resemble a more normal or more wild type, phenotype, ora phenotype more likely to produce a lower incidence or severity ofintracellular disorder symptoms.

[0137] In addition, animal-based intracellular vesicle disorder systems,which may include, for example bg mice, may be used to identifycompounds capable of ameliorating intracellular vesicle disorder-likesymptoms (e.g., bg phenotype). Such animal models may be used as testsubstrates for the identification of drugs, pharmaceuticals, therapiesand interventions which may be effective in treating such disorders. Forexample, animal models may be exposed to a compound, suspected ofexhibiting an ability to ameliorate intracellular vesicle disordersymptoms, at a sufficient concentration and for a time sufficient toelicit such an amelioration of the symptoms in the exposed animals. Theresponse of the animals to the exposure may be monitored by assessingthe reversal of disorders associated with intracellular vesicledisorders such as CHS.

[0138] With regard to intervention, any treatments which reverse anyaspect of the intracellular disorder-like symptoms should be consideredas candidates for human intracellular disorder therapeutic intervention.Dosages of test agents may be determined by deriving dose-responsecurves, as discussed in Section 5.5.1, below.

15 5.4. Compounds and Methods for the Treatment of Intracellular VesicleDisorders

[0139] Described below are methods and compositions wherebyintracellular vesicle disorders, including, but not limited to, CHS maybe treated. Loss of normal bg gene product function results in thedevelopment of a bg, or intracellular vesicle disorder, phenotype, anincrease in bg gene product activity would facilitate progress towards anormal state in individuals exhibiting a deficient level of bg geneexpression and/or bg gene product activity.

[0140] Alternatively, it is conceivable that symptoms of certainintracellular vesicle disorders may be ameliorated by decreasing thelevel of bg gene expression and/or in gene product activity. Forexample, bg gene sequences may be utilized-in conjunction withwell-known antisense, gene “knock-out,” ribozyme and/or triple helixmethods to decrease the level of bg gene expression.

[0141] With respect to an increase in the level of normal bg geneexpression and/or bg gene product activity, bg gene nucleic acidsequences, described, above, in Section 5.1, can, for example, beutilized for the treatment of intracellular vesicle disorders, includingCHS. Such treatment can be administered, for example, in the form ofgene replacement therapy. Specifically, one or more copies of a normalbg gene or a portion of the bg gene that directs the production of a bggene product exhibiting normal bg gene function, may be inserted intothe appropriate cells within a patient, using vectors which include, butare not limited to adenovirus, adeno-associated virus, and retrovirusvectors, in addition to other particles that introduce DNA into cells,such as liposomes.

[0142] It is conceivable that it may be advantageous to achieve bg geneexpression in the brain, given the large number of cell type affected bythe bg and CHS phenotypes. As such, gene replacement therapy techniquesmay be utilized which are capable delivering bg gene sequences to thesecell types within patients. Thus, the techniques for delivery of bg genesequences should be able to readily cross the blood-brain barrier, whichare well known to those of skill in the art (see, e.g., PCT application,publication No. WO89/10134, which is incorporated herein by reference inits entirety), or, alternatively, should involve direct administrationof such bg gene sequences to the site of the cells in which the bg genesequences are to be expressed. With respect to delivery which is capableof crossing the blood-brain barrier, viral vectors such as, for example,those described above, are preferable.

[0143] Additional methods which may be utilized to increase the overalllevel of bg gene expression and/or bg gene product activity include theintroduction of appropriate bg-expressing cells, preferably autologouscells, into a patient at positions and in numbers which are sufficientto ameliorate the symptoms of intracellular vesicle disorders, includingCHS. Such cells may be either recombinant or non-recombinant.

[0144] Alternatively, cells, preferably autologous cells, can beengineered to express bg gene sequences which may then be introducedinto a patient in positions appropriate for the amelioration ofintracellular vesicle disorder symptoms. Alternately, cells whichexpress the bg gene in a wild type in MHC matched individuals, i.e.,non-bg individual, and may include, for example, hypothalamic cells. Theexpression of the bg gene sequences is controlled by the appropriategene regulatory sequences to allow such expression in the necessary celltypes. Such gene regulatory sequences are well known to the skilledartisan. Such cell-based gene therapy techniques are well known to thoseskilled in the art, see, e.g., Anderson, F., U.S. Pat. No. 5,399,349.

[0145] When the cells to be administered are non-autologous cells, theycan be administered using well known techniques which prevent a hostimmune response against the introduced cells from developing. Forexample, the cells may be introduced in an encapsulated form which,while allowing for an exchange of components with the immediateextracellular environment, does not allow the introduced cells to berecognized by the host immune system.

[0146] Additionally, compounds, such as those identified via techniquessuch as those described, above, in Section 5.4.3, which are capable ofmodulating bg gene product activity can be administered using standardtechniques which are well known to those of skill in the art.

5.5. Pharmaceutical Preparations and Methods of Administration

[0147] The compounds that are determined to affect bg gene expression orgene product activity can be administered to a patient attherapeutically effective doses to treat or ameliorate intracellularvesicle disorders, including CHS. A therapeutically effective doserefers to that amount of the compound sufficient to result inamelioration of symptoms of intracellular vesicle disorders, includingelements associated with the bg phenotype and/or the CHS phenotype.

5.5.1. Effective Dose

[0148] Toxicity and therapeutic efficacy of such compounds can bedetermined by standard pharmaceutical procedures in cell cultures orexperimental animals, e.g., for determining the LD₅₀ (the dose lethal to50% of the population) and the ED₅₀ (the dose therapeutically effectivein 50% of the population). The dose ratio between toxic and therapeuticeffects is the therapeutic index and it can be expressed as the ratioLD₅₀/ED₅₀. Compounds which exhibit large therapeutic indices arepreferred. While compounds that exhibit toxic side effects may be used,care should be taken to design a delivery system that targets suchcompounds to the site of affected tissue in order to minimize potentialdamage to uninfected cells and, thereby, reduce side effects.

[0149] The data obtained from the cell culture assays and animal studiescan be used in formulating a range of dosage for use in humans. Thedosage of such compounds lies preferably within a range of circulatingconcentrations that include the ED₅₀ with little or no toxicity. Thedosage may vary within this range depending upon the dosage formemployed and the route of administration utilized. For any compound usedin the method of the invention, the therapeutically effective dose canbe estimated initially from cell culture assays. A dose may beformulated in animal models to achieve a circulating plasmaconcentration range that includes the IC₅₀ (i.e., the concentration ofthe test compound which achieves a half-maximal inhibition of symptoms)as determined in cell culture. Such information can be used to moreaccurately determine useful doses in humans. Levels in plasma may bemeasured, for example, by high performance liquid chromatography.

5.5.2. Formulations and Use

[0150] Pharmaceutical compositions for use in accordance with thepresent invention may be formulated in conventional manner using one ormore physiologically acceptable carriers or excipients.

[0151] Thus, the compounds and their physiologically acceptable saltsand solvates may be formulated for administration by inhalation orinsufflation (either through the mouth or the nose) or oral, buccal,parenteral or rectal administration. For oral administration, thepharmaceutical compositions may take the form of, for example, tabletsor capsules prepared by conventional means with pharmaceuticallyacceptable excipients such as binding agents (e.g., pregelatinised maizestarch, polyvinylpyrrolidone or hydroxypropyl methylcellulose); fillers(e.g., lactose, microcrystalline cellulose or calcium hydrogenphosphate); lubricants (e.g., magnesium stearate, talc or silica);disintegrants (e.g., potato starch or sodium starch glycolate); orwetting agents (e.g., sodium lauryl sulphate). The tablets may be coatedby methods well known in the art. Liquid preparations for oraladministration may take the form of, for example, solutions, syrups orsuspensions, or they may be presented as a dry product for constitutionwith water or other suitable vehicle before use. Such liquidpreparations may be prepared by conventional means with pharmaceuticallyacceptable additives such as suspending agents (e.g., sorbitol syrup,cellulose derivatives or hydrogenated edible fats); emulsifying agents(e.g., lecithin or acacia); non-aqueous vehicles (e.g., almond oil, oilyesters, ethyl alcohol or fractionated vegetable oils); and preservatives(e.g., methyl or propyl-p-hydroxybenzoates or sorbic acid). Thepreparations may also contain buffer salts, flavoring, coloring andsweetening agents as appropriate.

[0152] Preparations for oral administration may be suitably formulatedto give controlled release of the active compound.

[0153] For buccal administration the compositions may take the form oftablets or lozenges formulated in conventional manner.

[0154] For administration by inhalation, the compounds for use accordingto the present invention are conveniently delivered in the form of anaerosol spray presentation from pressurized packs or a nebuliser, withthe use of a suitable propellant, e.g., dichlorodifluoromethane,trichlorofluoromethane, dichlorotetrafluoroethane, carbon dioxide orother suitable gas. In the case of a pressurized aerosol the dosage unitmay be determined by providing a valve to deliver a metered amount.Capsules and cartridges of e.g. gelatin for use in an inhaler orinsufflator may be formulated containing a powder mix of the compoundand a suitable powder base such as lactose or starch.

[0155] The compounds may be formulated for parenteral administration byinjection, e.g., by bolus injection or continuous infusion. Formulationsfor injection may be presented in unit dosage form, e.g., in ampoules orin multi-dose containers, with an added preservative. The compositionsmay take such forms as suspensions, solutions or emulsions in oily oraqueous vehicles, and may contain formulatory agents such as suspending,stabilizing and/or dispersing agents. Alternatively, the activeingredient may be in powder form for constitution with a suitablevehicle, e.g., sterile pyrogen-free water, before use.

[0156] The compounds may also be formulated in rectal compositions suchas suppositories or retention enemas, e.g., containing conventionalsuppository bases such as cocoa butter or other glycerides.

[0157] In addition to the formulations described previously, thecompounds may also be formulated as a depot preparation. Such longacting formulations may be administered by implantation (for examplesubcutaneously or intramuscularly) or by intramuscular injection. Thus,for example, the compounds may be formulated with suitable polymeric orhydrophobic materials (for example as an emulsion in an acceptable oil)or ion exchange resins, or as sparingly soluble derivatives, forexample, as a sparingly soluble salt.

[0158] The compositions may, if desired, be presented in a pack ordispenser device which may contain one or more unit dosage formscontaining the active ingredient. The pack may for example comprisemetal or plastic foil, such as a blister pack. The pack or dispenserdevice may be accompanied by instructions for administration.

6. EXAMPLE Genetic and Physical Mapping of the bg Gene

[0159] The Example presented in this Section describes genetic mappingof the murine bg locus into a minimal genetic interval of 0.41 cM+/−0.1cM on murine chromosome 13. Physical mapping of this minimal bg geneticinterval is established herein to be approximately 1 Mb.

6.1 Material and Methods

[0160] Mouse crosses segregating beige. Multiple strain crosses wereestablished to maximize inter strain variation in order to facilitatedetection of polymorphisms of mapping markers. These included i)(C57BL/6J-bg^(J) X DBA/2J) X C57BL/6J-bg^(J); ii) (DBA/2 Co-bg^(8J) XC57BL/6J) X DBA/2 Co- bg^(8J); iii) (C3H/HeJ-bg^(2J) X CAST/Ei) XC3H/HeJ-bg^(2J); iv) (C57BL/6J-bg^(J) X CAST/Ei) X C57BL/6J-bg^(J); v)(DBA/2 Co- b^(8J) X CAST/Ei) X DBA/2 Co- bg^(8J). The offspring of eachof these backcrosses were analyzed, by coat color, for their bggenotype. Genomic DNA was made from a tail flip from each and analyzedfor multiple simple sequence length repeat polymorphisms (SSLP). Not allstrain combinations were polymorphic for all markers. Additional loci,Nidogen (Nid) and Ras like protein1 (Rasl1) were also genotyped in micefrom crosses utilizing CAST/Ei. CAST/Ei vs inbred strain polymorphismswere detected using Single Stranded Conformational polymorphism (SSCP).The primers used for Nid were; forward 5′-CAGTGGAATGACCACCAGGCC-3′ andreverse 5′-GTTGCAGGCATGTACCACTAC-3′ (from mouse cDNA sequence, NCBIGenInfo ID: 53383) The Rasl1 primers were: forward5′-TATGAACCTACCAAAGCAGAC-3′ and reverse 5′ACTTCGGAAGTAGTTGTCTC (from ratRALA cDNA sequence, GenBank Accession: L19698). The PCR amplificationconditions were 94° C. for 2 minutes, 0.15 U of AmpliTaq was added for ahot start, followed by 30 cycles of 94° C. for 40 secs, 55° C. for 50secs, 72° C. for 30 secs. The products were run on either anondenaturing 8% acrylamide gel at 45 W, room temperature for 3 hours,for SSLP analysis or, for SSCP analysis, on a 10% acrylamide gel run at20 W, 4° C. for 2.5 hours. Both types of gel were stained, post running,with SYBR Green I and scanned on an MD Fluorimager.

[0161] A linkage map of all loci, including bg was constructed,manually, for proximal MMU Chr 13 by minimizing double and multiplecross overs.

[0162] Interspecific backcross mapping. One hundred and eighty eight(C57BL/6J X Mus spretus) X C57BL/6J backcross mice were generated andgenomic DNA was prepared to create a BSB mapping panel. A framework mapwas established using 80 previously mapped SSLP markers whichencompassed each chromosome. The conditions used for SSLP analysis wereas described above. Linkage maps were constructed using Map managerv2.6.5 (Manley, K. F., 1993, Mammalian Genome 4:303-313).

[0163] Additional loci, Nid, Rasl1, Ryanodine receptor 2, (Ryr2) andNeutrophil oxidase factor 2 -related sequence, (Ncf2-rs), were alsoplaced on the Chr 13 map using SSCP as each gave a C57BL/6J vs Musspretus polymorphism. Nid. Rasl, were typed as described above. Ryr2 wasanalyzed by SSCP using the following primers, (from mouse cDNA sequence,NCBI GenInfo ID: 516278): forward 5′-CAAAGAAAGCCCTCAGAAAC-3′ and reverse5′-AAAGAGGAAAACCCAAGACT-3′. Ncf2-rs was also analyzed by SSCP using thefollowing primers (designed from human cDNA sequence, GenBank AccessionNo. U00776): forward 5′-CAAAAACAAGACACCCAAGT-3′ and 3 reverse5′-TGTGGAATTGAGTGTTGTAG-3′.

[0164] Physical map of bg minimal interval. The Whitehead mouse YAClibrary (Research Genetics; Huntsville, Ala.) was screened using SSLPmarkers including D13Mit173, D13Mit44, D13Mit305 using the PCRconditions described above. The YAC end clones were isolated accordingto standard-methods. The YAC end clones were sequenced on an ABIsequencer. PCR primers from each unique end clone were designed, andused to map the end clone back to the mouse genome on the BSB map tocheck for chimeric YAC's. Those ends that mapped to the correct regionof MMU Chr 13 were subsequently used in further rounds of YAC libraryscreening. Cross addressing of the various SSLP markers and YAC endclones allowed a full YAC contig across the bg minimal genetic intervalto be established. BACs were also isolated across the physical regionusing markers from the region.

[0165] In order to size the YACS, yeast genomic DNA was preparedaccording to the New England Biolabs Imbed procedure. Contour clampedhomogeneous electric field (CHEF) electrophoresis was carried out usinga CHEF MAPPER electrophoresis apparatus (Bio-Rad Laboratories, Inc.,Hercules, Calif.) for 28 hours on a 1% agarose gel with an electricfield gradient of 6V/cm at 14° C. and a pulse time of 12.55 sec.

6.2 Results

[0166] The mapping procedures described in Section 6.1, above, anddepicted in FIG. 1, yielded a genetic map with gave a minimal geneticinterval for the ha locus of 0.41±0.1 cM. The data giving this resultare summarized herein.

[0167] The proximal interval, between bg and D13Mit173, was 2/690recombinants which was 0.14±0.1 cM. The distal interval, between bg andD13Mit305, was 8/1496 recombinants which was 0.27±0.1 cM. Thenon-recombinant marker SSLP marker D13Mit44 was typed in 690 animalsgiving an upper genetic distance (at the 95% confidence limit) betweenbg and D13Mit44 of 0.4 cM.

[0168] The homologous genes Nid, which maps to human Chr 1q43, and Rasl,which maps to human Chr 7p, were also placed on the bg genetic map. Nidwas non-recombinant in 690 mice putting it within (at the 95% confidencelimit) 0.4 cM of bg. Rasl mapped 0.72 cM distal of bg. The homologousgene Ryr2, which maps to human Chr 1q43-q42, was not mapped in the bgsegregating crosses as no polymorphism between any of the strains usedwas found but it was mapped in the BSB mapping panel. By inference fromthe BSB map, Ryr2 maps within 1.6 cM of Nid (95% confidence limit). Thesimplest interpretation of the mapping of homologous genes to the map ofproximal mouse Chr 13 around the bg region was that bg is likely to fallwithin the 1q43 syntenic region on the human genetic map.

[0169] The mapping of the locus referred to herein as Ncf2-rs was doneusing primers designed to amplify human cDNA sequence. Ncf2 has beenmapped to human Chr 1cen-q32 and to mouse chromosome 1 (Francke U.,1990, Am. J. Hum. Genet. 47:483-92). The primers designed produced a PCRfragment which mapped to the bg region of Chr13.

[0170] The YAC and BAC coverage across the minimal bg genetic intervalgives an estimated physical distance of approximately 1 Mb.

7. EXAMPLE cl YAC Complementation of the Beige Mutation

[0171] The experiments presented in this Section describe results whichhave localized the murine bg gene to a specific interval on murinechromosome 13. Specifically, a complementation-based strategy wasutilized to identify two overlapping murine yeast artificial chromosomes(YACs) capable of complementing the murine bg mutation. One of theseYACs was tested, via cell fusion studies, and found to be capable ofcomplementing Aleutian mink and human Chediak-Higashi syndrome (CHS)mutant phenotypes, thus strongly suggesting that the mouse, human andmink mutant phenotypes are caused by defects in homologous genes.

7.1. Materials and Methods

[0172] Isolation and Characterization of YACS. The primersF:5′-CCAGCCACAGAATACCATCC-3′ and R:5′-GGACATACTCTGCTGCCATC-3′ specificfor Nidogen (Nid) amino terminal sequences were used to screen thePrinceton and Whitehead mouse YAC libraries using the followingconditions on an Idaho Technologies Thermal Cycler: 20 sec. 94° hotstart, 94° 0 sec./50° 0 sec./72° 15 sec. for 35 cycles. To identify thepositive pools, the PCR products were separated on 2% agarose gels,transferred to Nytran membranes and probed by standard hybridizationtechniques. This screen resulted in the isolation of YACs 195.A8,151.H1, C9.E7, and C96.G11. YAC 113.G6 was isolated from the Whiteheadlibrary using the primers: F:5′-ACCCCAGAACTTGAGAAATAG-3′ andR:5′-TGCTGAGGTGATAGGTTTATG-3′ specific for the Sequence Tagged Site(STS) 195.A8-right end (R) using the above mentioned PCR conditions.Yeast plugs were prepared according to Gnirke et al. (Gnirke, A. et al.,1993, Genomics 15:659-667). YAC DNA was analyzed by Southern blots orPCR to determine STS and Nid content. The sizes of the YACs weredetermined using pulsed-field gel-electrophoresis on a Bio-Rad CHEFDRII, with a pulse time of 10 to 100 seconds.

[0173] YAC End Isolation. One or both end fragments of YACs C9.E7,195.A8 and 151.H1 were isolated and used to create STS. The endfragments were isolated using inverse PCR according to Joslyn et al.(Joslyn, G., et al., 1991, Cell 66:601-613). Each inverse PCR productwas either directly sequenced using the M13-UP and RP sites engineeredinto the primers, or cloned using Invitrogen's TA cloning kit, and thensequenced using the T7 and SP6 sequencing primers. PCR primers specificfor each unique end were created and tested on mouse genomic DNA todetermine whether they amplified the expected size product.

[0174] YAC End Analysis. Each YAC end was tested to determine whether itwas derived from mouse chromosome 13. For YAC end analysis, all STS weretested against a panel of mouse/hamster somatic cell hybrids, some ofwhich harbored mouse chromosome 13 (Kozak, C. A. et al., 1975, Som. CellGen. 1:371-382). Each hybrid was tested with the dinucleotide repeatmarkers D13MIT44 and 173 from Research Genetics specific for the bg/Nidregion of mouse chromosome 13 before use to determine whether therelevant region of mouse chromosome 13 was present. Southern blots ofthese hybrids were then probed with YAC end STS to determine if thesemarkers were present in the mouse chromosome 13 positive hybrids. Insome instances, YAC end STS were assayed by PCR. The ends of YAC 195.A8were further analyzed by genetic mapping onto a panel of interspecificbackcrossed beige mice (Jenkins, N. A. et al., 1991, Genomics9:401-403). Genomic DNA blots of mice from this panel were prepared andhybridized with the single copy STS 195.A8-R and 195.A8-left end (L).The map positions of these two markers was then determined using theprogram Map Manager v2.6.3 (Manley, K. F., 1993, Mammalian Genome4:303-313). This analysis placed these two markers on the same geneticinterval as bg and Nid.

[0175] Spheroplast fusion and YAC microinjection. All YACs were“retrofitted” with the neomycin resistance gene using the vector pRV1and homologous recombination in yeast as described by Srivastava andSchlessinger (Srivastava, A. & Schlessinger, D., 1991, Gene 103:53-59).This protocol introduces the neomycin resistance gene and the LYS2 geneinto the URA3 gene present in the YAC “right” arm. Spheroplast fusionusing retrofitted YACs was performed according to Huxley et al. (Huxley,C. et al., 1991, Genomics 9:742-75D) with the following modifications. A100 ml culture of yeast was grown to an OD⁶⁰⁰ of 3-4 in SD -Lys -Trp.Spheroplasts were prepared using Oxalyticase (Enzogenetics, Oregon) andresuspended in two milliliters of STC (1M Sorbitol, 10 mM CaCl₂, 10 mMTris pH8). bg mouse fibroblasts (3×10⁶ MCHSF2) were fused to 0.5 ml ofthe spheroplast preparation in 0.4 mls of 50% PEG/10 mM CaCl₂(Boehringer Mannheim) for 100, 150 and 200 seconds. The fusion reaction,was diluted with 4.0 mls of serum-free Dulbecco's, incubated at roomtemperature for twenty minutes, centrifuged at long and plated into four100 mm plates. Microinjections were performed according to Gnirke et al.(Gnirke, A. et al., 1993, Genomics 15:659-667). Twenty-four hours afterfusion or microinjection, the cells were washed twice with PBS andincubated with Dulbecco's minimal essential media containing 10% FCS and400-500 μg/ml G418 for three to four weeks. Individual colonies wereisolated and expanded to at least 1×10⁶ cells. Genomic DNA was isolatedfrom these colonies using Qiagen Genomic DNA Tips and the DNA used forSouthern or PCR analysis. YAC vector sequences which are immediatelyadjacent to the genomic insert were assayed using PCR primers specificfor YAC “left” and “right” arms (Peterson, K. R. et al., 1993, Proc.Natl, Acad. Sci. U.S.A. 90:7593-7597), or by Southern blotting using theYAC vector as a probe. For Southern blotting, ten μg of fibroblast DNAor 2 μg of yeast DNA was cut with HindIII, run on a 0.8% agarose gel,and then transferred to a Nytran membrane. This membrane was thenhybridized with the 9.5 kb gel purified HindIII fragment of theretrofitting vector pRV1.

[0176] Fluorescent-Microscopy. Cells were examined for lysosomalmorphology using fluorescent labeling of lysosomes (Perou, C. M. &Kaplan, J., 1993, Som. Cell Mol. Gen. 19:459-468). Briefly, lysosomeswere labeled by incubating cells overnight in Dulbecco's plus 10% FCSwith 0.5 mg/ml Lucifer Yellow-CH, followed by two washes in culturemedium, and a final 2-6 hour chase in medium alone. Lysosomes werevisualized on live cells using standard fluorescent microscopytechniques.

[0177] Somatic Cell Fusions. Aleutian mink or Human CHS (GM02075A)fibroblast lysosomes were labeled with Lucifer Yellow-CH, while thecomplemented bg mouse fibroblast colony 195-4 lysosomes was separatelylabeled with dextran-Texas Red. The two cell populations weretrypsinized, mixed together, and fused to one another usingUV-inactivated Sendai virus (Perou, C. M. & Kaplan, J., 1993, Som. CellMol. Gen. 19:459-468; Schlegel, R. S. & Rechsteiner, M. C., 1975, Cell5:371-379). The cells were plated and examined twenty-four hours laterusing fluorescent microscopy. Two photographs of the same field weretaken, one to visualize the Lucifer Yellow fluorescence and a second tovisualize the dextran-Texas Red. A heterokaryon could be identified bythe presence of both dyes within ail lysosomes of one cell.

7.2. Results

[0178] The experiments reported herein describe, first, the isolationand characterization of murine YACs lying within the physical region inwhich the bg gene must reside. Further, a complementation-based strategyis utilized to identify which of the isolated YACs were able tocomplement the bg phenotype, thus significantly narrowing the regionwithin which the bg gene must be located. Third, one of these murineYACs was tested, via cell fusion studies, and found to be capable ofcomplementing the bg phenotype in cells of other species, namely thoseof Aleutian mink and human.

[0179] YAC Characterization. As discussed in Section 6, above, and inJenkins et al. (Jenkins, N. A. et al., 1991, Genomics 9:401-403) the bggene is located near Nid on chromosome 13. PCR primers specific for Nidwere, therefore, utilized to isolate YACs from the bg/Nid region (FIG.2A). The Princeton and Whitehead mouse YAC libraries were screened,yielding two YACs from each library. Inverse PCR was used to isolate YACends, which were sequenced and used to create STS. Each isolated YAC endwas tested to determine if it was derived from mouse chromosome 13. Someof these STS were then used to develop a YAC contig across the bginterval of the four Nid positive YACs isolated, 151.H1 and C96.G11 weredetermined to be unstable and were not used. YACs C9.E7 and 195.A8 wereboth derived from chromosome 13, mapped to the bg interval, and remainedstable with time. A fifth YAC, 113.G6, was isolated from the WhiteheadLibrary using the STS 195.A8-R. It was determined that 113.G6 waschimeric, but had significant overlap (500 kb) with 195.A8. For allsubsequent experiments, the retrofitted derivatives of YACs C9.E7,195.A8, and 113.G6 were used.

[0180] Introduction of YACs into bg Mouse Fibroblasts. Retrofitted YACswere introduced by spheroplast fusion (Huxley, C. et al., 1991, Genomics9:742-750) or microinjection (Gnirke, A. et al., 1993, Genomics15:659-667) into fibroblasts derived from a C57BL/6J beige mouse (Perou,C. M. & Kaplan, J., 1993, Som. Cell Mol. Gen. 19:459-468). These cellsretained the bg phenotype of abnormally large lysosomes with a clusteredperinuclear distribution. This phenotype could be corrected by somaticcell fusion with normal cells (Perou, C. M. & Kaplan J., 1993, Som. CellMol. Gen. 19:459-468). Mutant cells containing YACs were selected forresistance to the neomycin analogue G418, and colonies were examined forlysosomal distribution and morphology by fluorescent microscopy usingLucifer Yellow labeling.

[0181] Seven G418 resistant colonies from several independentspheroplast fusions using YAC 195.A8 were obtained. The efficacy of YACtransfer using spheroplast fusion was extremely low as determined byG418-resistance. A frequency of colony formation of 10⁻⁷ was calculated.Southern and PCR analyses (Peterson, K. R. et al., 1993, Proc. Natl,Acad. Sci. U.S.A. 90:7593-7597) confirmed that all resistant coloniescontained YAC “right” arm vector sequences. Only three of the sevencolonies, however, contained YAC “left” arm vector sequences, indicatingthat the other four colonies contained only a fragment of the YAC. Ofthe seven colonies, five showed a complemented phenotype (FIG. 3B).These five colonies included the three complete YAC colonies and two ofthe fragmented YACs. Complemented cells showed dramatically smallerlysosomes than the parental bg cells (FIG. 3A and 3C). Other featuresindicating a corrected phenotype included lysosomes which were no longerclustered in the perinuclear region, and the-disappearance of tubularlysosomes. Tubular lysosomes are frequently seen in macrophages but arenot observed in normal fibroblasts. Tubular lysosomes are seen, however,in bg mouse and Aleutian mink fibroblasts.

[0182] YAC 113.G6 was introduced into murine bg fibroblasts usingspheroplast fusion and two independent colonies were obtained. Onecolony was complemented and contained sequences from both YAC arms. Theother colony was not complemented and contained only a fragmented copyof the YAC. YAC C9.E7 was microinjected into the bg mouse cell line andthirty independent colonies were obtained, five of which contained bothYAC vector arms as determined by PCR. All of these colonies retained thebg phenotype. Cells resistant to G418 due to YAC introduction, eitheruncomplemented cells carrying fragmented YACs or C9.E7 microinjectedcells, showed no complemented features regardless of the concentrationof G418 employed.

[0183] In the complemented colonies isolated using a G418 concentrationof 400-500 μg/ml, it was observed that not all cells showed thecomplemented phenotype. Some colonies appeared to contain a mixture ofbg and complemented cells (FIG. 3B). Two possibilities were considered.First, the colonies, as removed from the plate, might contain a mixtureof G418-resistant complemented cells and G418-resistant bg cells.Second, the YAC might be unstable and, at the concentration of G418employed, some cells may lose the YAC and revert to the bg phenotype. Todistinguish between these possibilities, the cell line 195-4 wasincubated in 800 μg/ml G418 and lysosomal morphology examined after 10days. Examination of multiple fields and several hundred cellsrevealed-very few (<1%) bg appearing cells (FIG. 3C). When thesecomplemented cells were incubated in the absence of G418 there was atime-dependent return to the bg phenotype. Seven days after the removalof G418, approximately 1.0% of the cells showed the mutant phenotype,but after thirty days greater than 30% of the cells showed the bgphenotype (FIG. 3D). These results demonstrate that complementation ofthe mutant phenotype was YAC-dependent.

[0184] The fact that not all YACs complement, and that not allspheroplast fusion generated colonies were complemented, suggests thatthe act of introducing these YACs along with yeast. DNA does not causethe reversion of the bg phenotype. Further, the fact that only certainfragmented YAC 195.A8 or 113.G6 YAC molecules failed to correct thephenotype, suggests that fragmented YACs can be utilized as part of astrategy to localize the relevant gene.

[0185] Complementation of the CHS Defect in Other Species by a MurineYAC. To analyze the nature of defective genes in different speciesexhibiting bg phenotypes, both complemented and uncomplemented 195.A8YAC containing bg cells were fused with cultured Aleutian mink or humanCHS derived cells (Perou, C. M. & Kaplan, J., 1993, Som. Cell Mol. Gen.19:459-4682). When Aleutian mink cells were the recipient cell line,complementation occurred only when the complemented ha mouse cell lineswere used. Identical results were obtained with the human CHS fibroblastcell line GM02075A. It was necessary to fuse complemented murine cellsto mink and human cells as neither mink nor human cells will accept YACsusing the spheroplast fusion or microinjection protocols. These resultsstrongly support the hypothesis that a similar gene (or genes) isresponsible for the Chediak phenotype.

[0186] It was found that the choice of cell lines was the most importantparameter in determining the efficacy of YAC transfer. No G418-resistantcolonies were ever obtained using primary human, mouse, or minkfibroblasts as the recipients. Colonies were obtained using the longterm cultured bg mouse cell line MCHSF2. A ten to twenty fold increasein the frequency of transformants was obtained using mouse L-cells.These results suggest that increased chromosome instability resultingfrom long term culture may contribute to increased transformationefficiency.

8. EXAMPLE Positional Cloning of a Candidate Beige Gene

[0187] The Example presented in this Section describes the cloning of agene, referred to here as the 22B/30B gene, which represents a candidatemurine beige (bg) gene. Extending the studies described in Sections 6and 7, above, the 22B/30B gene was identified via a refinement of theYAC mapping data presented above, couple with a positional cloningstrategy. Characterization of the 22B/30B gene indicated that the geneproduces an approximately 12-14 kb mRNA that encodes a novel proteinexhibiting strong nucleotide homology to multiple expressed sequencetags (ESTs), including human ESTS.

8.1. Material and Methods

[0188] YAC Characterization. The Princeton and Whitehead mouse YAClibraries were screened by PCR with primers specific for the right endsequence of YAC 195.A8, as described above. This screen resulted in theisolation of two additional YACS, 137.A10 and B27.F7. An additional YACfrom this region was isolated from the Princeton library using primersfrom the end sequences of the cDNA clone 22B, described in Section 8,below. The F primer was 5′-ATTGGCTAGTGTGTGCAGAC-3′ and the R primer was5′-GAAGCAGATGACTGAGCAGA-3′. PCR reactions were performed on an IdahoTechnologies Thermal Cycler under the following conditions: 20 sec. 94°C. hot start, 94° C. 0 sec./55° C. 0 sec./72° C. 30 sec. for 30 cycles.

[0189] All other YAC techniques were as described, above, in Section7.1.

[0190] Isolation of cDNAs and Preparation of Plugs. Agarose blockscontaining yeast chromosomal and YAC 195.A8 DNA were prepared asdescribed in Gnirke et al (Gnirke, A. et al., 1993, Genomics15:659-667), loaded in a 1%, 0.5×TBE gel and electrophoresed in aBio-Rad DRII clamped homogeneous electric field (CHEF) apparatus(Bio-Rad Laboratories, Inc., Hercules, Calif.) at 200 V with a constantpulse time of 60 sec. for 24 hrs. The YAC was excised and purified usingthe GeneClean II Kit according to manufacturers instructions (Bio 101,Inc., La Jolla, Calif.). Gel-purified YAC DNA was radiolabelled with³²P-dCTP by random priming. The hybridization probe was pre-competedwith 100 μg of sonicated genomic mouse DNA, 50 μg of mouse COT-1 DNA(GIBCO BRL, Gaithersburg, Md.) and 20 μg of sonicated pYAC55 DNA (Sigma,St. Louis, Mo.) for 2 hrs at 65° C. Filters containing plaques from aC57BL/6J mouse E16.5 cDNA library (Stratagene, La Jolla, Calif.) wereprehybridized at 65° C. for 6-8 hrs in RapidHyb buffer (Amersham,Arlington Heights, Ill.) containing 100 μg/ml sonicated mouse genomicDNA, 4 μg/ml COT-1 DNA and 2 μg/ml sonicated pYAC55 DNA. Hybridizationproceeded overnight at 65° C. Filters were washed to 0.1×SSC at 65° C.Clones positive after a secondary screen were recovered as phagemids.

[0191] Genomic DNA Isolation and Southern Blots. High molecular weightmouse DNA for Southern Blots and PCR analysis was either purchased fromthe Jackson Labs (Bar Harbour, Me.) or isolated using a Qiagen tip 2500(Qiagen, Inc., Chatsworth, Calif.). Southern blots were prepared andhybridized according to (Jenkins, N. A. et al., 1982, J. Virol.43:26-36), exposed to Fuji Imaging Plates, Type BAS-IIIS and visualizedusing a Fujix Bas 1000 Phosphoimager (Fuji Film I & I, Fuji MedicalSystems U.S.A., Inc., Stamford, Conn.).

[0192] RNA Isolation, Northern Blots. Total RNA was isolated fromvarious mouse tissues and cultured mouse and human melanoma cells usingthe RNA STAT-60 reagent (Tel-Test “B”, Inc., Friendswood, Tex.)according to manufacturer's instructions. For Northern blotpreparations, 25 μg of this RNA was run on a 1.5% denaturing gel andtransferred overnight onto Zeta pore membrane (CUNO, Inc., Meriden,Conn.) in 10×SSC. Filters were hybridized with a gel purified 811 bpHindIII+Pst I fragment from the clone 30B that was radiolabeled with³²P-dCTP by random priming. Hybridization was performed at 65° C.overnight in QuikHybe Hybridization Solution (Stratagene, La Jolla,Calif.). Filters were washed to 0.1×SSC at 65° C. and visualized byX-ray film autoradiography.

8.2. Results

[0193] YAC characterization. The minimal bg interval was refined byfurther in vitro complementation of bg murine fibroblasts withadditional YACS (FIG. 2B). First, it was demonstrated that YAC151.H1,which contains restriction fragments in common with YAC195.A8, asdefined by fingerprinting with COT-1 DNA, was not capable ofcomplementing bg. Furthermore, YAC137.A10 which is nearly identical tothat of YAC 113.G6, also failed to complement the bg phenotype. Thesestudies, therefore, demonstrate that the minimal bg region mustlie-between the proximal end of YAC137.A10 and the distal end ofYAC151.H1.

[0194] Isolation of candidate genes in the bg minimal region. Thecomplementing YAC195.A8 (See Section 7, above) was gel purified,radiolabelled and used to isolate clones from an E16.5 day mouse embryocDNA library. Forty five clones were isolated. Based on sequenceanalysis and mapping to the YAC-defined physical map, six genes weredefined.

[0195] Of particular note was a gene, referred to herein as the 22B/30Bgene, defined by two cDNA clones, 30B and 22B. These clones had 447 bpof overlap sequence, with 30B extending more 5′ than 22B, and werelocated within the region predicted to contain the bg gene. In order todetermine whether the cDNA clones 22B and 30B mapped physically to theinterval predicted to contain the bg gene, the clones were used asprobes against restriction enzyme digested YAC DNA. Thenon-complementing YAC137.A10 lacked two HindIII 30B hybridizing bandsthat were present in complementing YAC113.G8. Likewise, YAC151H.1 lackedsome HindIII bands hybridizing with 22B.

[0196] Based upon the complementation data, it was predicted that thecomplete bg gene would lie in the region of overlap between YACs 113.G6and 195.A8, but would be disrupted or absent from the non-complementingYACs 137.A10 and 151.H1. This was the pattern observed for the 22B/30Bgene, making it a candidate for the bg gene.

[0197] Sequence of 22B/30B gene. Sequencing of the two overlapping cDNAclones, 22B and 30B, of the putative bg gene totaled 6831 bp ofcontiguous sequence (FIG. 4; SEQ ID NO:1). 6559 bp of this was openreading frame followed by a stop codon at nucleotide 6560 and 269 bp ofthe 3′ untranslated region (with 30B present 5′ of this contiguoussequence relative to 22B).

[0198] The 22B/30B protein sequence predicted from the 22B/30Bnucleotide sequence is 2186 amino acids and encodes a novel protein(FIG. 4; SEQ ID NO:2). A BLASTX (1993, Nature Genetics 3:266-272) searchwith the 22B/30B protein amino acid sequence did, however, identifysignificant homologies to several sequences. Such sequences included ananonymous S. cerievisiae protein, YCR032w, encoded by a 7 kb mRNA, twoC. elegans novel proteins, T01H10.8 and F10F2.1 and a human gene, celldivision control protein 4-related protein, CDC4L. Amino acid residues1520-1807 of the 22B/30B protein sequence exhibited the highest level ofamino acid conservation. Within this region, the S. cerievisiae and C.elegans proteins showed 50% identity and 75% similarity to murine22B/30B. The homology to the human CDC4L protein spanned a shortersegment (22B/30B amino acid residues 1675-1806), but again showed 50%identity.

[0199] A known protein motif was found within the 22B/30B amino acidsequence. Specifically, a WD40 or G protein-beta subunit repeat motif(van der Voorn, L. & Ploegh, H. L., 1992, FEBS Lett. 307:131-134) wasfound to be located at amino acid residue 2016-2030. This motif wasoriginally identified in the β-subunit of the G-protein transducin(Duronio, R. J. et a., 1992, Proteins 13:41-56), and is thought to beinvolved in mediating protein-protein interactions (Wang, D. S. et al.,1994, Biochem. Biophys. Res. Comm. 203:29-35). None of the proteinsfound to be homologous to the 22B/30B protein sequence contain such amotif.

[0200] Comparison of the 22B/30B DNA sequence to the dbEST databaseidentified homologies to ESTs from two human cDNA libraries.Specifically, 22B/30B nucleotides 725-942 were 82% identical to humancDNA clone H51623 isolated from a fetal liver and spleen cDNA library,22B/30B nucleotides 1530-1596 were 88% identical and 22B/30B nucleotides1596-1842 were 74% identical to the human cDNA clone H50968 isolatedfrom the same fetal cDNA library. 22B/30B nucleotides 1096-1269 were 89%identical to the cDNA clone Z21358 isolated from an adult human testislibrary. 22B/30B nucleotides 1092-1164 were 87% identical andnucleotides 1165-1302 were 91% identical to the human cDNA clone Z21296isolated from the same testis cDNA library. In summary, the 22B/30Bsequence from approximately nucleotide 725 to nucleotide 1842 appearedto be highly homologous at the nucleotide level to one or more humangene sequences.

[0201] Expression of the 22B/30B gene. PCR analysis from reversetranscribed murine mRNA was used for detecting expression of the 22B/30Bgene. Such an analysis indicated that the 22B/30B message was expressedin each of the tissues tested, namely liver, spleen, kidney, thymus,muscle, fat, heart, lung, stomach, pancreas and cultured fibroblasts.Using an 811 bp probe from the most 51 end of the available cDNA 22B/30Bsequence, a Northern blot of mRNA from two human melanoma cell lines,WM-115 and WM266-4 and from the mouse B16 melanoma cell line showedhybridization to an approximately 12-14 kb message. It should be notedthat the 811 bp probe used overlapped with the portion of the 22B/30Bsequence discussed above that exhibits 82% identity to a human EST.

9. EXAMPLE Identification of the Beige Gene Via Beige Mutation Detection

[0202] The Example presented herein describes the successfulidentification of the bg gene, the homolog of the gene responsible forthe human Chediak-Higashi syndrome (CHS), via the sequencing of twoindependent mutant bg alleles. The mutation detection analysis revealedthat the bg gene corresponds to the 22B/30B gene described in Section 8,above.

9.1. Material and Methods

[0203] Southern blot/Genomic DNA isolation. The procedures utilized wereas described in Section 8.1, above.

[0204] RT-PCR. RNA was isolated as described, above, in Section 8.1.Reverse transcription-polymerase chain reactions (RT-PCR) were carriedout as follows: briefly, 0.5 μg of total RNA was reverse transcribedinto cDNA using equal concentrations of random and oligo(dT)₁₅ primersand AMV reverse transcriptase (Promega Corp., Madison, Wis.) in a finalvolume of 200 μl. One μl of each reaction was amplified with 0.25 μM ofeach of the appropriate primers. The primers were as follows: 22B-5F-22B-5F-5′-TCTTCTTGTCCTGCCTGATGCT-3′;22B-D11-5′-GTGCTTCACTTCCTCCAGATC-3′; 22B-D6-5′-GCCTCATTCCAGCGAAGC-3′;22B-D10-5′-CTGGATAGCAGGTGATGGGTGGTTA-3′.

[0205] Amplifications were carried out in a final volume of 25 μl in1×PCR buffer containing 1.5 mM MgCl₂, 0.5 Units Ampli Taq polymerase(Perkin-Elmer-Cetus). After an initial denaturation step at 94° C. for 2mins, samples were subjected to 35 cycles of 40 sec at 94° C., 50 sec at57° C., and 2 mins at 72° C. Following a 10 mins final extension at 72°C., samples were stored at 4° C. PCR products were separated byelectrophoresis through 2%, 1×TBE agarose gels.

[0206] PCR: Mouse genomic DNA (C57BL/6J, C57BL/6J-bg/bg,Satin/Beige-bg/bg DNA) was amplified using the following primers: 228F:5′-TGCTGTGGATTATATGAACTC-3′ and 228R: 5′-GGTCTCTATTAGTCCGAGAAC-3′.Amplification parameters were as follows: 2 minutes hot start 94° C.,94° C. 30 seconds/52° C. 30 seconds/ 72° C. 4 minutes, for 30 cycles ona Perkin-Elmer DNA Thermal-Cycler.

9.2 Results

[0207] bg gene mutation detection. Described herein are bg gene mutationstudies which reveal that the gene corresponding to the 22B/30B gene,described above, in Section 8, corresponds to the murine bg gene.Specifically, nucleotide defects within two bg mutant alleles aredemonstrated to lie within the 22B/30B region and to result in theproduction of C terminally truncated proteins.

[0208] The original bg mutation arose in a radiation experiment at theOak Ridge National Laboratory. Hence it was probably radiation inducedand was either on a chromosome originating from the C3H/R1 or the 101/R1inbred strains of mice. Because the original bg mutation wasradiation-induced, it was possible that the mutation could be visiblevia Southern blot analysis. There have been many subsequent re-mutationsof the mouse bg gene, all of which have arisen spontaneously. Some ofthese are extinct with no surviving tissues or DNA. For others, forexample C57BL-bg^(10J) and C57BL-bg^(11J) although the mutation isextinct there is DNA available (Jackson Laboratories), and for others,the mutation is still available as a live mutant stock, e.g. SJL-bg,C57BL-bg^(J), C3H/HeJ-bg^(2J), and DBA/2J-CO-bg^(8J). Southern blotanalysis of these multiple bg alleles and their appropriate normalcontrols showed no polymorphic bands for probes from either the 5′ or 3′regions of the 22B/30B gene sequence, although a probe, 22B/30Bnucleotides 6489-6719, did make it possible to determine that theoriginal bg allele arose on a C3H-like chromosome, not a 101/R1 derivedchromosome. In contrast, when a 510 bp fragment, 22B/30B nucleotides1618-2127, was used as a probe, the original mutant bg allele showedaltered bands for 7 out of 9 enzymes (FIG. 5A-5D).

[0209] PCR primers, as described above, in Section 9.1, were designedwhich surrounded and spanned the 510 bp region and were used to amplifygenomic DNA and cDNA. One primer set, designated 228F and R, amplified a2 kb genomic fragment from C57BL/6J but amplified a 3 kb genomicfragment from the strain carrying the original bg allele. A similar setof primers, 22B-5F and 22B-11, was used to amplify cDNA prepared fromthe kidneys of C57BL/6J, C3H/HeJ and SJL/J-bg. A single band of 312 bpwas detected in both C57BL/6J and C3H/HeJ. cDNA from the SJL/J-bg,mouse, however, produced two bands, 428 bp and 637 bp.

[0210] Both the RT-PCR products, as well as the genomic DNA PCR productswere isolated and directly sequenced using standard procedures.Sequencing of the amplified products revealed that the bg mutation waslocated within the 22B/30B gene. Specifically, analysis of the amplifiedsequences revealed that the increased size of the genomic product fromthe bg allele was the result of an incomplete LINE 1 element (Burton, F.H. et al., 1986, J. Mol. Biol. 187:291-304) insertion into an introncontained within the 30B/22B gene's genomic DNA. As this elementcontained adventitious splice donor and acceptor sites, two aberrantmRNAs were created that each result in a frame shift. The 428 bp bgRT-PCR product had a 116 bp LINE 1 insertion between 22B/30B nucleotides2235 and 2236, while analysis of the larger product demonstrated a 325bp LINE 1 insertion at this same location. Both of these two LINE 1insertions results in the introduction of stop codons and in a 22B/30Bprotein product that is truncated by 1442 amino acids. See FIG. 6 for adiagram depicting the location of these insertions.

[0211] Analysis of another bg allele, bg^(8J), by sequencing of anRT-PCR product produced using primers 22B-D6 and 22B-D10, identified a Cto T base change creating a stop codon at 22B/30B bp 2027. The mutationresulted in the production of a truncated 22B/30B protein missing thelast 1511 amino acids.

[0212] It should be noted that the truncated proteins produced by eachof the bg (22B/30B) mutant alleles lack the amino acid sequencehomologous to S. cerevisiae, C. elegans and human CDC4L genes and alsodelete the putative WD40 motif (described, above, in Section 8).

[0213] In summary, two independent bg gene mutations were revealed tolie within the sequence of the 22B/30B gene, thus presenting compellingevidence identifying the 22B/30B gene to, in fact, correspond the bggene.

10. EXAMPLE Identification and Characterization of the Human bg Gene

[0214] The Example presented herein describes the successfulidentification and characterization of cDNA molecules corresponding tothe human beige (bg) gene. Characterization of the isolated cDNAmolecules revealed that the human bg gene undergoes alternativesplicing, yielding long (putative full length) and short forms of bgtranscripts and bg gene products, as described below.

10.1. Materials and Methods

[0215] cDNA cloning. A human retina λgt10 library (Cat. No. HL1132a;Clontech, Palo Alto Calif.) was screened with a mixture of three DNAfragments isolated from mouse beige clones. They were, in order from 5′to 3′, 30B (bp 82-921 of FIG. 4), 22B (bp 1650-2160 of FIG. 4), andK2+K5 (bp 6520-6750 of FIG. 4).

[0216] The three probes were labeled with ³²P by random priming andhybridized with filters representing 10⁶ clones overnight at 42° C. inChurch's buffer (7% SDS, 250 mM NaHPO₄, 2 μM EDTA, 1% BSA). The filterswere washed in 2×SSC, 1% SDS at 42° C. Positive plaques were replatedand treated in the same manner. Phage DNA was prepped by a standardplate lysate method. After digestion of the phage DNA with EcoRI, cDNAinserts were isolated and subcloned into pBluescript (Stratagene; LaJolla Calif.) for DNA sequencing. DNA sequencing was performed accordingto standard techniques.

[0217] cDNA identified in the above screening was used to probe a λgt10human fetal liver library (Cat. No. HL3020a; Clontech, Palo Alto Calif.)and the human retina library described above. Filters representing 10⁶clones of each library were hybridized at 65° C. overnight with ³²Plabelled probe in Church's buffer and washed in 0.1×SSC, 0.1% SDS at 65°C. Positive plaques were replated and rescreened in the same manner.Phage DNA was prepared, and cDNA inserts were isolated and subcloned asdescribed above.

10.2. Results

[0218] In order to identify the human bg gene, murine bg gene sequenceswere used to screen human cDNA libraries. Screening, phage isolation anddetails are presented in Section 10.1, above.

[0219] First, a murine bg sequences were used to probe a human retinacDNA library. This screen resulted in the identification of a phagecontaining a 2 kb cDNA insert (designated fvhx004). The cDNA insert wasisolated and subcloned. The fvhx004 cDNA insert was then used torescreen the human retina cDNA library and to screen a λgt10 human fetalliver library, as described in Section 10.1, above. This screen yieldedtwo positive phage from the human fetal liver library. One phagecontained a 4.4 kb cDNA insert (designated fvh1006) and the secondcontained a 6.3 kb cDNA insert (designated fvh1009). A 3 kb subclone offvh1006 which overlapped the fvh1009 clone was designated fvh1006a.Additional subclones of fvh1006 were designated fvh1006b (a 1 kbsublcone) and fvh1006c (a 400 bp subclone). An additional positive phagewas also isolated from the human retina library. This phage contained a2 kb cDNA insert (designated fvhx003a). A 1.1 kb HindIII/EcoRV fragmentfrom fvh1009 was used to rescreen the human retina library. This screenresulted in one positive phage containing a 1.6 kb insert, designatedfvhx015.

[0220] The isolated clones were sequenced according to standardprocedures. A database search using human bg nucleotide sequencerevealed extensive homology to human cDNA clones H51623 (96% identity),Z21358 (99% identify) and Z21296 (97% identity), as indicated inparentheses. These clones were described in Section 9, above.

[0221] Comparison of the human bg sequence with that of mouse bgsequences revealed a 378 base pair region present in mouse sequencewhich was absent from the sequence obtained from the isolated humanclones. PCR of both the retina and liver libraries with primers flankingthis sequence, however, revealed that the sequence was present in boththese libraries. Sequencing of the resultant PCR products, coupled withthe sequence obtained from the isolated clones, produced what isreferred to below as the “long” form of bg gene sequence, while thesequence of the isolated clones, alone, yielded what is referred tobelow as the “short” form of bg gene sequence.

[0222]FIG. 7 presents the long form (putative full length) human bg genenucleic acid sequence. FIG. 7 further depicts the derived amino acidsequence encoded by the long form (putative full length) human bg genenucleic acid sequence shown therein. As shown in FIG. 7, the predictedlong form human bg gene product contains 3801 amino acid residues. As inthe mouse bg gene product described in Section 8, above, the human bggene product contains a WD40 or G protein-beta subunit repeat motif. Inthe long form human bg gene product this motif is present at amino acidresidues 3694-4708.

[0223]FIG. 8 presents the short form human bg gene nucleic acidsequence. FIG. 8 further depicts the derived amino acid sequence encodedby the short form human bg gene nucleic acid sequence shown therein. Asshown in FIG. 8, the predicted short form human bg gene product contains3672 amino acid residues. It is missing base pairs 7544-7921 of the longform depicted in FIG. 7. The short form bg nucleic acid sequence retainsthe same frame as the long form throughout its length and encodes a bggene product which is missing amino acid residues 2451-2577 of the longform depicted in FIG. 7. The WD40 sequence motif is present in the shortform bg gene product at amino acid residues 3565-3579 depicted in FIG.8.

[0224] The present invention is not to be limited in scope by thespecific embodiments described herein, which are intended as singleillustrations of individual aspects of the invention, and functionallyequivalent methods and components are within the scope of the invention.Indeed, various modifications of the invention, in addition to thoseshown and described herein will become apparent to those skilled in theart from the foregoing description and accompanying drawings. Suchmodifications are intended to fall within the scope of the appendedclaims.

1 32 6830 base pairs nucleic acid single linear DNA Coding Sequence1...6558 1 GCA CGA GGG GAA ATC TCC ATA TGG GTC TCT GGG CAG AGG AAG ACTGAT 48 Ala Arg Gly Glu Ile Ser Ile Trp Val Ser Gly Gln Arg Lys Thr Asp 15 10 15 GTC ATC TTG GAT TTT GTG CTC CCA AGA AAA ACA AGC TTA TCA TCA GAC96 Val Ile Leu Asp Phe Val Leu Pro Arg Lys Thr Ser Leu Ser Ser Asp 20 2530 AGC AAT AAA ACA TTT TGC ATG ATT GGT CAT TGC TTA ACA TCC CAA GAA 144Ser Asn Lys Thr Phe Cys Met Ile Gly His Cys Leu Thr Ser Gln Glu 35 40 45GAG TCT CTG CAA TTA GCT GGA AAA TGG GAC CTG GGG AAC TTG CTC CTC 192 GluSer Leu Gln Leu Ala Gly Lys Trp Asp Leu Gly Asn Leu Leu Leu 50 55 60 TTCAAT GGA GCT AAA ATT GGC TCA CAA GAG GCC TTT TTC CTG TAT GCT 240 Phe AsnGly Ala Lys Ile Gly Ser Gln Glu Ala Phe Phe Leu Tyr Ala 65 70 75 80 TGTGGA CCC AAC TAC ACA TCC ATC ATG CCG TGT AAA TAT GGA CAG CCA 288 Cys GlyPro Asn Tyr Thr Ser Ile Met Pro Cys Lys Tyr Gly Gln Pro 85 90 95 GTC ATTGAC TAC TCC AAA TAC ATT AAT AAA GAC ATT TTG AGA TGT GAT 336 Val Ile AspTyr Ser Lys Tyr Ile Asn Lys Asp Ile Leu Arg Cys Asp 100 105 110 GAA ATCAGA GAC CTT TTT ATG ACC AAG AAA GAA GTG GAT GTT GGT CTC 384 Glu Ile ArgAsp Leu Phe Met Thr Lys Lys Glu Val Asp Val Gly Leu 115 120 125 TTA ATTGAA AGT CTT TCA GTT GTT TAT ACA ACT TGC TGT CCT GCT CAG 432 Leu Ile GluSer Leu Ser Val Val Tyr Thr Thr Cys Cys Pro Ala Gln 130 135 140 TAC ACCATC TAT GAA CCA GTG ATT CGA CTC AAG GGC CAA GTG AAA ACT 480 Tyr Thr IleTyr Glu Pro Val Ile Arg Leu Lys Gly Gln Val Lys Thr 145 150 155 160 CAGCCC TCT CAA AGA CCC TTC AGC TCA AAG GAA GCC CAG AGC ATC TTG 528 Gln ProSer Gln Arg Pro Phe Ser Ser Lys Glu Ala Gln Ser Ile Leu 165 170 175 CTAGAA CCT TCT CAA CTC AAA GGC CTC CAA CCT ACG GAA TGT AAA GCC 576 Leu GluPro Ser Gln Leu Lys Gly Leu Gln Pro Thr Glu Cys Lys Ala 180 185 190 ATCCAG GGC ATT CTG CAT GAG ATT GGT GGG GCT GGC ACA TTT GTT TTT 624 Ile GlnGly Ile Leu His Glu Ile Gly Gly Ala Gly Thr Phe Val Phe 195 200 205 CTCTTT GCT AGG GTT GTT GAA CTT AGT AGC TGT GAA GAA ACT CAA GCA 672 Leu PheAla Arg Val Val Glu Leu Ser Ser Cys Glu Glu Thr Gln Ala 210 215 220 TTAGCA CTG CGG GTT ATA CTG TCT TTA ATT AAG TAC AGC CAA CAG AGA 720 Leu AlaLeu Arg Val Ile Leu Ser Leu Ile Lys Tyr Ser Gln Gln Arg 225 230 235 240ACA CAG GAA CTG GAA AAT TGT AAT GGA CTC TCT ATG ATT CAC CAA GTG 768 ThrGln Glu Leu Glu Asn Cys Asn Gly Leu Ser Met Ile His Gln Val 245 250 255TTG GTC AAA CAG AAA TGC ATT GTT GGC TTT CAC ATT TTG AAG ACC CTT 816 LeuVal Lys Gln Lys Cys Ile Val Gly Phe His Ile Leu Lys Thr Leu 260 265 270CTT GAA GGT TGC TGC GGT GAA GAA GTT ATC CAC GTC AGT GAG CAT GGA 864 LeuGlu Gly Cys Cys Gly Glu Glu Val Ile His Val Ser Glu His Gly 275 280 285GAG TTC AAG CTG GAT GTT GAG TCT CAT GCT ATA ATC CAA GAT GTT AAG 912 GluPhe Lys Leu Asp Val Glu Ser His Ala Ile Ile Gln Asp Val Lys 290 295 300CTG CTG CAG GAA CTG TTA CTT GAC TGG AAG ATA TGG AAT AAG GCA GAG 960 LeuLeu Gln Glu Leu Leu Leu Asp Trp Lys Ile Trp Asn Lys Ala Glu 305 310 315320 CAA GGT GTG TGG GAG ACT CTG CTA GCA GCT TTG GAA GTC CTC ATC CGG 1008Gln Gly Val Trp Glu Thr Leu Leu Ala Ala Leu Glu Val Leu Ile Arg 325 330335 GTA GAG CAC CAC CAG CAG CAG TTT AAT ATT AAG CAG TTG CTG AAC GCC 1056Val Glu His His Gln Gln Gln Phe Asn Ile Lys Gln Leu Leu Asn Ala 340 345350 CAC GTG GTT CAC CAC TTC CTA CTG ACC TGT CAG GTT TTA CAG GAA CAC 1104His Val Val His His Phe Leu Leu Thr Cys Gln Val Leu Gln Glu His 355 360365 AGA GAG GGG CAG CTT ACA TCT ATG CCC CGA GAA GTT TGT AGA TCA TTT 1152Arg Glu Gly Gln Leu Thr Ser Met Pro Arg Glu Val Cys Arg Ser Phe 370 375380 GTG AAA ATC ATT GCA GAA GTC CTT GGT TCT CCT CCA GAC TTG GAA TTA 1200Val Lys Ile Ile Ala Glu Val Leu Gly Ser Pro Pro Asp Leu Glu Leu 385 390395 400 TTG ACA GTT ATT TTC AAT TTC CTG TTA GCT GTA CAC CCT CCT ACT AAT1248 Leu Thr Val Ile Phe Asn Phe Leu Leu Ala Val His Pro Pro Thr Asn 405410 415 ACT TAT GTT TGT CAC AAT CCC ACA AAC TTC TAC TTC TCT TTG CAC ATA1296 Thr Tyr Val Cys His Asn Pro Thr Asn Phe Tyr Phe Ser Leu His Ile 420425 430 GAT GGC AAG ATC TTT CAG GAG AAA GTG CAG TCA CTC GCG TAC CTG AGG1344 Asp Gly Lys Ile Phe Gln Glu Lys Val Gln Ser Leu Ala Tyr Leu Arg 435440 445 CAT TCT AGC AGC GGA GGG CAA GCC TTT CCC AGC CCT GGA TTC CTG GTA1392 His Ser Ser Ser Gly Gly Gln Ala Phe Pro Ser Pro Gly Phe Leu Val 450455 460 ATA AGC CCA TCT GCC TTT ACT GCA GCT CCT CCT GAA GGA ACC AGT TCT1440 Ile Ser Pro Ser Ala Phe Thr Ala Ala Pro Pro Glu Gly Thr Ser Ser 465470 475 480 TCC AAT ATT GTT CCA CAG CGG ATG GCT GCT CAG ATG GTT CGA TCTAGA 1488 Ser Asn Ile Val Pro Gln Arg Met Ala Ala Gln Met Val Arg Ser Arg485 490 495 AGT CTA CCA GCA TTT CCT ACT TAT TTA CCA CTA ATA CGA GCA CAAAAA 1536 Ser Leu Pro Ala Phe Pro Thr Tyr Leu Pro Leu Ile Arg Ala Gln Lys500 505 510 CTG GCT GCA AGT TTG GGT TTT AGT GTT GAC AAG TTA CAA AAT ATTGCA 1584 Leu Ala Ala Ser Leu Gly Phe Ser Val Asp Lys Leu Gln Asn Ile Ala515 520 525 GAT GCC AAC CCA GAG AAA CAG AAT CTT TTA GGA AGA CCC TAC GCACTG 1632 Asp Ala Asn Pro Glu Lys Gln Asn Leu Leu Gly Arg Pro Tyr Ala Leu530 535 540 AAA ACA AGC AAA GAG GAA GCA TTC ATC AGC AGC TGT GAG TCT GCAAAG 1680 Lys Thr Ser Lys Glu Glu Ala Phe Ile Ser Ser Cys Glu Ser Ala Lys545 550 555 560 ACT GTT TGT GAA ATG GAG GCT CTT CTT GGA GCC CAC GCC TCTGCC AAT 1728 Thr Val Cys Glu Met Glu Ala Leu Leu Gly Ala His Ala Ser AlaAsn 565 570 575 GGG GTT TCC AGA GGA TCA CCG AGG TTC CCC AGG GCC AGA GTAGAT CAC 1776 Gly Val Ser Arg Gly Ser Pro Arg Phe Pro Arg Ala Arg Val AspHis 580 585 590 AAA GAT GTG GGA ACA GAG CCC AGA TCA GAT GAT GAC AGT CCTGGG GAT 1824 Lys Asp Val Gly Thr Glu Pro Arg Ser Asp Asp Asp Ser Pro GlyAsp 595 600 605 GAG TCT TAC CCA CGT CGG CCT GAC AAC CTC AAG GGA CTG GCCTCA TTC 1872 Glu Ser Tyr Pro Arg Arg Pro Asp Asn Leu Lys Gly Leu Ala SerPhe 610 615 620 CAG CGA AGC CAA AGC ACT GTC GCA AGC CTT GGG CTG GCG TTTCCC TCT 1920 Gln Arg Ser Gln Ser Thr Val Ala Ser Leu Gly Leu Ala Phe ProSer 625 630 635 640 CAG AAT GGA TCT GCA GTT GCT AGC AGG TGG CCA AGT CTTGTT GAT AGG 1968 Gln Asn Gly Ser Ala Val Ala Ser Arg Trp Pro Ser Leu ValAsp Arg 645 650 655 AAT GCT GAT GAC TGG GAG AAC TTT ACC TTT TCT CCT GCTTAT GAG GCA 2016 Asn Ala Asp Asp Trp Glu Asn Phe Thr Phe Ser Pro Ala TyrGlu Ala 660 665 670 AGC TAC AAC CGA GCC ACA AGC ACC CAC AGT GTC ATT GAAGAC TGT CTG 2064 Ser Tyr Asn Arg Ala Thr Ser Thr His Ser Val Ile Glu AspCys Leu 675 680 685 ATA CCT ATC TGC TGT GGA TTA TAT GAA CTC TTA AGT GGGGTT CTT CTT 2112 Ile Pro Ile Cys Cys Gly Leu Tyr Glu Leu Leu Ser Gly ValLeu Leu 690 695 700 GTC CTG CCT GAT GCT ATG CTT GAA GAT GTG ATG GAC AGGATT ATT CAA 2160 Val Leu Pro Asp Ala Met Leu Glu Asp Val Met Asp Arg IleIle Gln 705 710 715 720 GCA GAT ATT CTT CTA GTC CTT GTT AAC CAC CCA TCACCT GCT ATC CAG 2208 Ala Asp Ile Leu Leu Val Leu Val Asn His Pro Ser ProAla Ile Gln 725 730 735 CAA GGA GTA ATT AAA CTG TTA CAT GCA TAC ATT AATAGA GCA TCA AAG 2256 Gln Gly Val Ile Lys Leu Leu His Ala Tyr Ile Asn ArgAla Ser Lys 740 745 750 GAG CAA AAG GAC AAG TTT CTG AAG AAC CGT GGC TTTTCC TTA TTA GCC 2304 Glu Gln Lys Asp Lys Phe Leu Lys Asn Arg Gly Phe SerLeu Leu Ala 755 760 765 AAC CAG TTG TAT CTT CAT AGG GGA ACT CAG GAG TTGTTG GAG TGC TTT 2352 Asn Gln Leu Tyr Leu His Arg Gly Thr Gln Glu Leu LeuGlu Cys Phe 770 775 780 GTT GAA ATG TTC TTT GGT CGA CCG ATT GGC CTG GATGAA GAA TTT GAT 2400 Val Glu Met Phe Phe Gly Arg Pro Ile Gly Leu Asp GluGlu Phe Asp 785 790 795 800 CTG GAG GAA GTG AAG CAC ATG GAA CTG TTC CAGAAG TGG TCT GTC ATT 2448 Leu Glu Glu Val Lys His Met Glu Leu Phe Gln LysTrp Ser Val Ile 805 810 815 CCC GTT CTC GGA CTA ATA GAG ACC TCT CTC TATGAC AAT GTC CTC TTG 2496 Pro Val Leu Gly Leu Ile Glu Thr Ser Leu Tyr AspAsn Val Leu Leu 820 825 830 CAC AAT GCT CTT TTA CTT CTT CTG CAA GTT TTAAAC TCT TGT TCC AAG 2544 His Asn Ala Leu Leu Leu Leu Leu Gln Val Leu AsnSer Cys Ser Lys 835 840 845 GTA GCA GAC ATG CTA CTG GAC AAT GGT CTA CTCTAT GTA TTA TGT AAT 2592 Val Ala Asp Met Leu Leu Asp Asn Gly Leu Leu TyrVal Leu Cys Asn 850 855 860 ACA GTA GCA GCC CTG AAT GGA TTA GAA AAG AACATT CCT GTG AAC GAA 2640 Thr Val Ala Ala Leu Asn Gly Leu Glu Lys Asn IlePro Val Asn Glu 865 870 875 880 TAC AAA TTG CTC GCA TGT GAT ATA CAG CAGCTT TTC ATA GCA GTT ACA 2688 Tyr Lys Leu Leu Ala Cys Asp Ile Gln Gln LeuPhe Ile Ala Val Thr 885 890 895 ATT CAT GCT TGC AGT TCC TCA GGC ACA CAGTAT TTT AGA GTG ATT GAA 2736 Ile His Ala Cys Ser Ser Ser Gly Thr Gln TyrPhe Arg Val Ile Glu 900 905 910 GAC CTT ATT GTA CTT CTT GGA TAT CTT CATAAT AGC AAA AAC AAG AGG 2784 Asp Leu Ile Val Leu Leu Gly Tyr Leu His AsnSer Lys Asn Lys Arg 915 920 925 ACA CAA AAT ATG GCT TTG GCC CTG CAG CTTAGA GTT CTC CAG GCT GCT 2832 Thr Gln Asn Met Ala Leu Ala Leu Gln Leu ArgVal Leu Gln Ala Ala 930 935 940 TTG GAA TTT ATA AGG AGC ACA GCC AAT CATGAC TCT GAA AGT CCA GTG 2880 Leu Glu Phe Ile Arg Ser Thr Ala Asn His AspSer Glu Ser Pro Val 945 950 955 960 CAC TCG CCT TCT GCC CAC CGC CAT TCAGTG CCT CCG AAG CGG AGA AGC 2928 His Ser Pro Ser Ala His Arg His Ser ValPro Pro Lys Arg Arg Ser 965 970 975 ATT GCT GGT TCT CGC AAA TTC CCT CTGGCT CAG ACA GAG TCT CTG CTG 2976 Ile Ala Gly Ser Arg Lys Phe Pro Leu AlaGln Thr Glu Ser Leu Leu 980 985 990 ATG AAG ATG CGC TCA GTG GCC AGC GATGAG CTA CAC TCT ATG ATG CAG 3024 Met Lys Met Arg Ser Val Ala Ser Asp GluLeu His Ser Met Met Gln 995 1000 1005 AGG AGG ATG AGC CAA GAG CAC CCCAGC CAG GCC TCG GAG GCA GAG CTC 3072 Arg Arg Met Ser Gln Glu His Pro SerGln Ala Ser Glu Ala Glu Leu 1010 1015 1020 GCT CAG AGG CTG CAG AGG CTCACC ATC TTA GCT GTG AAC AGG ATT ATT 3120 Ala Gln Arg Leu Gln Arg Leu ThrIle Leu Ala Val Asn Arg Ile Ile 025 1030 1035 1040 TAC CAA GAG TTG AATTCA GAT ATT ATT GAC ATT TTG AGA ACT CCA GAA 3168 Tyr Gln Glu Leu Asn SerAsp Ile Ile Asp Ile Leu Arg Thr Pro Glu 1045 1050 1055 AAT ACA TCC CAAAGC AAG ACC TCA GTT TCT CAG ACT GAA ATT TCT GAA 3216 Asn Thr Ser Gln SerLys Thr Ser Val Ser Gln Thr Glu Ile Ser Glu 1060 1065 1070 GAA GAC ATGCAT CAT GAG CAA CCT TCT GTA TAT AAT CCA TTT CAA AAA 3264 Glu Asp Met HisHis Glu Gln Pro Ser Val Tyr Asn Pro Phe Gln Lys 1075 1080 1085 GAA ATGTTA ACC TAT CTG TTG GAT GGC TTC AAA GTG TGT ATT GGT TCA 3312 Glu Met LeuThr Tyr Leu Leu Asp Gly Phe Lys Val Cys Ile Gly Ser 1090 1095 1100 AGTAAA ACT AGC GTT TCT AAG CAG CAG TGG ACT AAA ATC CTG GGG TCT 3360 Ser LysThr Ser Val Ser Lys Gln Gln Trp Thr Lys Ile Leu Gly Ser 105 1110 11151120 TGT AAA GAA ACC CTC CGA GAC CAG CTT GGA AGA TTG CTA GCG CAT ATT3408 Cys Lys Glu Thr Leu Arg Asp Gln Leu Gly Arg Leu Leu Ala His Ile1125 1130 1135 TTG TCT CCA ACC CAC ACT GTA CAA GAA CGG AAG CAG ATA CTTGAG ATA 3456 Leu Ser Pro Thr His Thr Val Gln Glu Arg Lys Gln Ile Leu GluIle 1140 1145 1150 GTT CAT GAA CCA GCT CAC CAG GAT ATA CTT CGT GAC TGTCTT AGC CCC 3504 Val His Glu Pro Ala His Gln Asp Ile Leu Arg Asp Cys LeuSer Pro 1155 1160 1165 TCC CCA CAA CAT GGA GCC AAG TTG GTT TTG TAT TTGTCA GAG TTG ATA 3552 Ser Pro Gln His Gly Ala Lys Leu Val Leu Tyr Leu SerGlu Leu Ile 1170 1175 1180 CAT AAT CAT CAG GAT GAG TTA AGT GAA GAA GAAATG GAC ACA GCA GAA 3600 His Asn His Gln Asp Glu Leu Ser Glu Glu Glu MetAsp Thr Ala Glu 185 1190 1195 1200 CTG CTT ATG AAT GCT CTA AAG TTA TGTGGC CAC AAG TGC ATC CCG CCC 3648 Leu Leu Met Asn Ala Leu Lys Leu Cys GlyHis Lys Cys Ile Pro Pro 1205 1210 1215 AGT GCC CCT TCC AAA CCA GAG CTCATT AAG ATC ATC AGA GAG GAG CAA 3696 Ser Ala Pro Ser Lys Pro Glu Leu IleLys Ile Ile Arg Glu Glu Gln 1220 1225 1230 AAG AAG TAT GAA AGT GAA GAGAGT GTG AGC AAA GGC TCA TGG CAG AAA 3744 Lys Lys Tyr Glu Ser Glu Glu SerVal Ser Lys Gly Ser Trp Gln Lys 1235 1240 1245 ACG GTG AAC AAC AAC CAGCAA AGT CTC TTC CAG AGG CTC GAT TTC AAA 3792 Thr Val Asn Asn Asn Gln GlnSer Leu Phe Gln Arg Leu Asp Phe Lys 1250 1255 1260 TCC AAG GAT ATA TCTAAA ATC GCT GCA GAC ATC ACC CAG GCT GTA TCA 3840 Ser Lys Asp Ile Ser LysIle Ala Ala Asp Ile Thr Gln Ala Val Ser 265 1270 1275 1280 CTC TCC CAAGGC ATT GAA AGG AAG AAG GTG ATC CAG CAC ATC AGA GGG 3888 Leu Ser Gln GlyIle Glu Arg Lys Lys Val Ile Gln His Ile Arg Gly 1285 1290 1295 ATG TACAAA GTT GAC CTG AGT GCC AGC AGG CAC TGG CAG GAA TGC ATC 3936 Met Tyr LysVal Asp Leu Ser Ala Ser Arg His Trp Gln Glu Cys Ile 1300 1305 1310 CAGCAG CTG ACA CAT GAC AGA GCA GTC TGG TAT GAC CCA ATC TAC TAT 3984 Gln GlnLeu Thr His Asp Arg Ala Val Trp Tyr Asp Pro Ile Tyr Tyr 1315 1320 1325CCA ACT TCA TGG CAG TTG GAT CCA ACA GAA GGG CCA AAC CGA GAG AGG 4032 ProThr Ser Trp Gln Leu Asp Pro Thr Glu Gly Pro Asn Arg Glu Arg 1330 13351340 AGA CGT TTG CAG AGA TGC TAT CTA ACT ATT CCC AAT AAG TAC CTC CTG4080 Arg Arg Leu Gln Arg Cys Tyr Leu Thr Ile Pro Asn Lys Tyr Leu Leu 3451350 1355 1360 AGG GAC AGA CAG AAG TCA GAA GGT GTG CTC AGG CCC CCA CTCTCT TAC 4128 Arg Asp Arg Gln Lys Ser Glu Gly Val Leu Arg Pro Pro Leu SerTyr 1365 1370 1375 CTT TTT GAA GAT AAA ACT CAT TCT TCC TTC TCC TCT ACTGTC AAA GAC 4176 Leu Phe Glu Asp Lys Thr His Ser Ser Phe Ser Ser Thr ValLys Asp 1380 1385 1390 AAA GCT GCA AGT GAA TCC ATC AGA GTG AAT CGA AGATGT ATC AGT GTT 4224 Lys Ala Ala Ser Glu Ser Ile Arg Val Asn Arg Arg CysIle Ser Val 1395 1400 1405 GCA CCA TCT AGA GAG ACA GCT GGG GAA TTG TTGTTA GGT AAA TGT GGG 4272 Ala Pro Ser Arg Glu Thr Ala Gly Glu Leu Leu LeuGly Lys Cys Gly 1410 1415 1420 ATG TAC TTT GTG GAA GAC AAT GCC TCT GACGCA GTT GAA AGC TCG AGC 4320 Met Tyr Phe Val Glu Asp Asn Ala Ser Asp AlaVal Glu Ser Ser Ser 425 1430 1435 1440 CTC CAA GGG GAG TTA GAG CCG GCATCA TTT TCT TGG ACA TAT GAG GAA 4368 Leu Gln Gly Glu Leu Glu Pro Ala SerPhe Ser Trp Thr Tyr Glu Glu 1445 1450 1455 ATT AAA GAA GTT CAC AGG CGCTGG TGG CAA CTA AGA GAT AAT GCT GTA 4416 Ile Lys Glu Val His Arg Arg TrpTrp Gln Leu Arg Asp Asn Ala Val 1460 1465 1470 GAA ATC TTT TTA ACA AATGGC AGA ACA CTC CTA TTA GCA TTT GAC AAT 4464 Glu Ile Phe Leu Thr Asn GlyArg Thr Leu Leu Leu Ala Phe Asp Asn 1475 1480 1485 AAC AAG GTT CGT GATGAC GTG TAC CAG AGC ATC CTC ACA AAT AAC CTC 4512 Asn Lys Val Arg Asp AspVal Tyr Gln Ser Ile Leu Thr Asn Asn Leu 1490 1495 1500 CCA AAT CTT CTGGAG TAC GGC AAC ATC ACC GCT CTG ACA AAC CTG TGG 4560 Pro Asn Leu Leu GluTyr Gly Asn Ile Thr Ala Leu Thr Asn Leu Trp 505 1510 1515 1520 TAT TCTGGA CAA ATT ACC AAT TTT GAA TAT TTG ACT CAT TTA AAC AAG 4608 Tyr Ser GlyGln Ile Thr Asn Phe Glu Tyr Leu Thr His Leu Asn Lys 1525 1530 1535 CATGCG GGC CGG TCC TTC AAT GAT CTC ATG CAG TAC CCG GTG TTC CCC 4656 His AlaGly Arg Ser Phe Asn Asp Leu Met Gln Tyr Pro Val Phe Pro 1540 1545 1550TTC ATC CTT TCT GAC TAT GTT AGT GAG ACT CTT GAC CTC AAT GAT CCA 4704 PheIle Leu Ser Asp Tyr Val Ser Glu Thr Leu Asp Leu Asn Asp Pro 1555 15601565 TCT ATC TAC AGA AAC CTA TCT AAG CCT ATA GCT GTG CAG TAT AAA GAA4752 Ser Ile Tyr Arg Asn Leu Ser Lys Pro Ile Ala Val Gln Tyr Lys Glu1570 1575 1580 AAA GAA GAC CGT TAC GTT GAC ACA TAC AAG TAC TTG GAG GAGGAG TAT 4800 Lys Glu Asp Arg Tyr Val Asp Thr Tyr Lys Tyr Leu Glu Glu GluTyr 585 1590 1595 1600 CGC AAG GGA GCT CGA GAG GAT GAC CCC ATG CCT CCTGTG CAA CCC TAC 4848 Arg Lys Gly Ala Arg Glu Asp Asp Pro Met Pro Pro ValGln Pro Tyr 1605 1610 1615 CAC TAT GGC TCC CAC TAC TCC AAC AGC GGC ACCGTG CTC CAC TTC CTG 4896 His Tyr Gly Ser His Tyr Ser Asn Ser Gly Thr ValLeu His Phe Leu 1620 1625 1630 GTC AGG ATG CCG CCT TTC ACT AAA ATG TTTCTA GCC TAT CAA GAT CAG 4944 Val Arg Met Pro Pro Phe Thr Lys Met Phe LeuAla Tyr Gln Asp Gln 1635 1640 1645 AGT TTC GAC ATT CCA GAC CGA ACA TTTCAT TCT ACA AAC ACA ACT TGG 4992 Ser Phe Asp Ile Pro Asp Arg Thr Phe HisSer Thr Asn Thr Thr Trp 1650 1655 1660 CGC CTC TCC TCC TTT GAG TCC ATGACT GAT GTG AAG GAG CTG ATT CCA 5040 Arg Leu Ser Ser Phe Glu Ser Met ThrAsp Val Lys Glu Leu Ile Pro 665 1670 1675 1680 GAG TTT TTC TAT CTT CCTGAG TTC TTA GTG AAC CGT GAA GGC TTT GAC 5088 Glu Phe Phe Tyr Leu Pro GluPhe Leu Val Asn Arg Glu Gly Phe Asp 1685 1690 1695 TTC GGT GTT CGT CAGAAT GGA GAG CGG GTT AAC CAC GTC AAT CTT CCT 5136 Phe Gly Val Arg Gln AsnGly Glu Arg Val Asn His Val Asn Leu Pro 1700 1705 1710 CCC TGG GCA CGCAAC GAT CCT CGG CTG TTC ATC CTT ATT CAC CGG CAA 5184 Pro Trp Ala Arg AsnAsp Pro Arg Leu Phe Ile Leu Ile His Arg Gln 1715 1720 1725 GCA CTA GAGTCT GAC CAT GTG TCC CAG AAC ATC TGT CAC TGG ATC GAC 5232 Ala Leu Glu SerAsp His Val Ser Gln Asn Ile Cys His Trp Ile Asp 1730 1735 1740 TTA GTGTTT GGC TAC AAG CAA AAG GGG AAG GCG TCT GTT CAA GCC ATC 5280 Leu Val PheGly Tyr Lys Gln Lys Gly Lys Ala Ser Val Gln Ala Ile 745 1750 1755 1760AAT GTC TTC CAC CCT GCT ACA TAT TTT GGA ATG GAT GTC TCT GCA GTT 5328 AsnVal Phe His Pro Ala Thr Tyr Phe Gly Met Asp Val Ser Ala Val 1765 17701775 GAA GAT CCA GTG CAG AGA CGG GCT TTA GAA ACC ATG ATA AAA ACC TAC5376 Glu Asp Pro Val Gln Arg Arg Ala Leu Glu Thr Met Ile Lys Thr Tyr1780 1785 1790 GGG CAG ACC CCA CGT CAG TTG TTC CAC ACA GCC CAT GCC AGCCGA CCT 5424 Gly Gln Thr Pro Arg Gln Leu Phe His Thr Ala His Ala Ser ArgPro 1795 1800 1805 GGA GCC AAG CTT AAC ATC GAA GGA GAG CTT CCA GCA GCTGTT GGC TTG 5472 Gly Ala Lys Leu Asn Ile Glu Gly Glu Leu Pro Ala Ala ValGly Leu 1810 1815 1820 TTA GTC CAG TTC GCT TTC AGA GAG ACC CGA GAA CCAGTC AAG GAA GTC 5520 Leu Val Gln Phe Ala Phe Arg Glu Thr Arg Glu Pro ValLys Glu Val 825 1830 1835 1840 ACT CAT CCG AGC CCT TTG TCA TGG ATA AAAGGC TTG AAG TGG GGG GAG 5568 Thr His Pro Ser Pro Leu Ser Trp Ile Lys GlyLeu Lys Trp Gly Glu 1845 1850 1855 TAC GTA GGT TCC CCC AGT GCT CCA GTACCT GTG GTC TGC TTC AGC CAG 5616 Tyr Val Gly Ser Pro Ser Ala Pro Val ProVal Val Cys Phe Ser Gln 1860 1865 1870 CCC CAT GGA GAA AGA TTT GGT TCCCTG CAG GCA CTG CCC ACC AGA GCC 5664 Pro His Gly Glu Arg Phe Gly Ser LeuGln Ala Leu Pro Thr Arg Ala 1875 1880 1885 ATC TGT GGT TTA TCA CGA AACTTC TGT CTT CTG ATG ACC TAC AAC AAG 5712 Ile Cys Gly Leu Ser Arg Asn PheCys Leu Leu Met Thr Tyr Asn Lys 1890 1895 1900 GAG CAA GGT GTG AGA AGCATG AAC AAC ACC AAT ATT CAG TGG TCT GCT 5760 Glu Gln Gly Val Arg Ser MetAsn Asn Thr Asn Ile Gln Trp Ser Ala 905 1910 1915 1920 ATC CTA AGC TGGGGA TAT GCT GAC AAC ATC TTA CGG TTG AAA AGT AAG 5808 Ile Leu Ser Trp GlyTyr Ala Asp Asn Ile Leu Arg Leu Lys Ser Lys 1925 1930 1935 CAG AGT GAGCCA CCA ATC AAC TTC ATT CAG AGT TCA CAG CAG CAC CAG 5856 Gln Ser Glu ProPro Ile Asn Phe Ile Gln Ser Ser Gln Gln His Gln 1940 1945 1950 GTA ACCAGT TGT GCC TGG GTG CCT GAC AGT TGT CAG CTC TTC ACT GGG 5904 Val Thr SerCys Ala Trp Val Pro Asp Ser Cys Gln Leu Phe Thr Gly 1955 1960 1965 AGCAAG TGT GGT GTC ATC ACA GCC TAT ACC AAC AGG CTC ACC AGC AGC 5952 Ser LysCys Gly Val Ile Thr Ala Tyr Thr Asn Arg Leu Thr Ser Ser 1970 1975 1980ACG CCC TCA GAA ATT GAA ATG GAG AGT CAG ATG CAT CTC TAT GGA CAC 6000 ThrPro Ser Glu Ile Glu Met Glu Ser Gln Met His Leu Tyr Gly His 985 19901995 2000 ACA GAG GAG ATC ACC GGC TTA TGT GTC TGC AAG CCG TAC AGC GTGATG 6048 Thr Glu Glu Ile Thr Gly Leu Cys Val Cys Lys Pro Tyr Ser Val Met2005 2010 2015 ATA AGC GTG AGC AGA GAC GGG ACC TGC ATA GTA TGG GAC CTGAAC AGG 6096 Ile Ser Val Ser Arg Asp Gly Thr Cys Ile Val Trp Asp Leu AsnArg 2020 2025 2030 CTG TGC TAT GTA CAA AGT TTG GCT GGA CAC AAA AGC CCTGTG ACG GCT 6144 Leu Cys Tyr Val Gln Ser Leu Ala Gly His Lys Ser Pro ValThr Ala 2035 2040 2045 GTC TCT GCC AGT GAA ACG TCA GGT GAC ATT GCT ACTGTG TGT GAC TCA 6192 Val Ser Ala Ser Glu Thr Ser Gly Asp Ile Ala Thr ValCys Asp Ser 2050 2055 2060 GCT GGC GGG GGC AGT GAC CTG AGA CTC TGG ACCGTG AAT GGG GAC CTC 6240 Ala Gly Gly Gly Ser Asp Leu Arg Leu Trp Thr ValAsn Gly Asp Leu 065 2070 2075 2080 GTT GGA CAT GTC CAC TGC AGA GAG ATCATT TGT TCT GTA GCT TTC TCC 6288 Val Gly His Val His Cys Arg Glu Ile IleCys Ser Val Ala Phe Ser 2085 2090 2095 AAC CAG CCT GAG GGA GTC TCC ATCAAC GTC ATT GCT GGG GGA TTA GAA 6336 Asn Gln Pro Glu Gly Val Ser Ile AsnVal Ile Ala Gly Gly Leu Glu 2100 2105 2110 AAT GGC ATT GTA AGG CTA TGGAGC ACA TGG GAC TTG AAG CCT GTG AGA 6384 Asn Gly Ile Val Arg Leu Trp SerThr Trp Asp Leu Lys Pro Val Arg 2115 2120 2125 GAG ATT ACA TTT CCC AAATCA AAT AAG CCC ATC ATA AGC CTG ACA TTC 6432 Glu Ile Thr Phe Pro Lys SerAsn Lys Pro Ile Ile Ser Leu Thr Phe 2130 2135 2140 TCC TGT GAT GGC CACCAT TTG TAC ACT GCC AAC AGT GAG GGG ACA GTG 6480 Ser Cys Asp Gly His HisLeu Tyr Thr Ala Asn Ser Glu Gly Thr Val 145 2150 2155 2160 ATC GCA TGGTGC CGG AAG GAC CAG CAG CGT GTG AAG CTG CCC ATG TTC 6528 Ile Ala Trp CysArg Lys Asp Gln Gln Arg Val Lys Leu Pro Met Phe 2165 2170 2175 TAC TCTTTC CTC AGC AGC TAC GCA GCT GGA TGAAGAGAAG GAGTGTCCCC AGA 6581 Tyr SerPhe Leu Ser Ser Tyr Ala Ala Gly 2180 2185 GGACATAAGC ACCGCTCTGCGAGCCTGGCT CCACCAACTG CAGAAGCAGA TGACTGACA 6641 GATATCCAGG AAAGACAACACACGTGCCTC TGTGCGCGCT TCCCCAGCCT CCGTGGGCT 6701 GAGAGTAAAG CCCTGCCCTCATTCCATAAT GGCGTGGAAG GCTGGGTCTG CACACACAG 6761 CCAATTAAAG TCAGAATCTTGATGCTTTTT CCCAAAAGGT TAGGCTGAAT CAAAGATAG 6821 GCTCGTGCC 6830 2186amino acids amino acid unknown protein internal 2 Ala Arg Gly Glu IleSer Ile Trp Val Ser Gly Gln Arg Lys Thr Asp 1 5 10 15 Val Ile Leu AspPhe Val Leu Pro Arg Lys Thr Ser Leu Ser Ser Asp 20 25 30 Ser Asn Lys ThrPhe Cys Met Ile Gly His Cys Leu Thr Ser Gln Glu 35 40 45 Glu Ser Leu GlnLeu Ala Gly Lys Trp Asp Leu Gly Asn Leu Leu Leu 50 55 60 Phe Asn Gly AlaLys Ile Gly Ser Gln Glu Ala Phe Phe Leu Tyr Ala 65 70 75 80 Cys Gly ProAsn Tyr Thr Ser Ile Met Pro Cys Lys Tyr Gly Gln Pro 85 90 95 Val Ile AspTyr Ser Lys Tyr Ile Asn Lys Asp Ile Leu Arg Cys Asp 100 105 110 Glu IleArg Asp Leu Phe Met Thr Lys Lys Glu Val Asp Val Gly Leu 115 120 125 LeuIle Glu Ser Leu Ser Val Val Tyr Thr Thr Cys Cys Pro Ala Gln 130 135 140Tyr Thr Ile Tyr Glu Pro Val Ile Arg Leu Lys Gly Gln Val Lys Thr 145 150155 160 Gln Pro Ser Gln Arg Pro Phe Ser Ser Lys Glu Ala Gln Ser Ile Leu165 170 175 Leu Glu Pro Ser Gln Leu Lys Gly Leu Gln Pro Thr Glu Cys LysAla 180 185 190 Ile Gln Gly Ile Leu His Glu Ile Gly Gly Ala Gly Thr PheVal Phe 195 200 205 Leu Phe Ala Arg Val Val Glu Leu Ser Ser Cys Glu GluThr Gln Ala 210 215 220 Leu Ala Leu Arg Val Ile Leu Ser Leu Ile Lys TyrSer Gln Gln Arg 225 230 235 240 Thr Gln Glu Leu Glu Asn Cys Asn Gly LeuSer Met Ile His Gln Val 245 250 255 Leu Val Lys Gln Lys Cys Ile Val GlyPhe His Ile Leu Lys Thr Leu 260 265 270 Leu Glu Gly Cys Cys Gly Glu GluVal Ile His Val Ser Glu His Gly 275 280 285 Glu Phe Lys Leu Asp Val GluSer His Ala Ile Ile Gln Asp Val Lys 290 295 300 Leu Leu Gln Glu Leu LeuLeu Asp Trp Lys Ile Trp Asn Lys Ala Glu 305 310 315 320 Gln Gly Val TrpGlu Thr Leu Leu Ala Ala Leu Glu Val Leu Ile Arg 325 330 335 Val Glu HisHis Gln Gln Gln Phe Asn Ile Lys Gln Leu Leu Asn Ala 340 345 350 His ValVal His His Phe Leu Leu Thr Cys Gln Val Leu Gln Glu His 355 360 365 ArgGlu Gly Gln Leu Thr Ser Met Pro Arg Glu Val Cys Arg Ser Phe 370 375 380Val Lys Ile Ile Ala Glu Val Leu Gly Ser Pro Pro Asp Leu Glu Leu 385 390395 400 Leu Thr Val Ile Phe Asn Phe Leu Leu Ala Val His Pro Pro Thr Asn405 410 415 Thr Tyr Val Cys His Asn Pro Thr Asn Phe Tyr Phe Ser Leu HisIle 420 425 430 Asp Gly Lys Ile Phe Gln Glu Lys Val Gln Ser Leu Ala TyrLeu Arg 435 440 445 His Ser Ser Ser Gly Gly Gln Ala Phe Pro Ser Pro GlyPhe Leu Val 450 455 460 Ile Ser Pro Ser Ala Phe Thr Ala Ala Pro Pro GluGly Thr Ser Ser 465 470 475 480 Ser Asn Ile Val Pro Gln Arg Met Ala AlaGln Met Val Arg Ser Arg 485 490 495 Ser Leu Pro Ala Phe Pro Thr Tyr LeuPro Leu Ile Arg Ala Gln Lys 500 505 510 Leu Ala Ala Ser Leu Gly Phe SerVal Asp Lys Leu Gln Asn Ile Ala 515 520 525 Asp Ala Asn Pro Glu Lys GlnAsn Leu Leu Gly Arg Pro Tyr Ala Leu 530 535 540 Lys Thr Ser Lys Glu GluAla Phe Ile Ser Ser Cys Glu Ser Ala Lys 545 550 555 560 Thr Val Cys GluMet Glu Ala Leu Leu Gly Ala His Ala Ser Ala Asn 565 570 575 Gly Val SerArg Gly Ser Pro Arg Phe Pro Arg Ala Arg Val Asp His 580 585 590 Lys AspVal Gly Thr Glu Pro Arg Ser Asp Asp Asp Ser Pro Gly Asp 595 600 605 GluSer Tyr Pro Arg Arg Pro Asp Asn Leu Lys Gly Leu Ala Ser Phe 610 615 620Gln Arg Ser Gln Ser Thr Val Ala Ser Leu Gly Leu Ala Phe Pro Ser 625 630635 640 Gln Asn Gly Ser Ala Val Ala Ser Arg Trp Pro Ser Leu Val Asp Arg645 650 655 Asn Ala Asp Asp Trp Glu Asn Phe Thr Phe Ser Pro Ala Tyr GluAla 660 665 670 Ser Tyr Asn Arg Ala Thr Ser Thr His Ser Val Ile Glu AspCys Leu 675 680 685 Ile Pro Ile Cys Cys Gly Leu Tyr Glu Leu Leu Ser GlyVal Leu Leu 690 695 700 Val Leu Pro Asp Ala Met Leu Glu Asp Val Met AspArg Ile Ile Gln 705 710 715 720 Ala Asp Ile Leu Leu Val Leu Val Asn HisPro Ser Pro Ala Ile Gln 725 730 735 Gln Gly Val Ile Lys Leu Leu His AlaTyr Ile Asn Arg Ala Ser Lys 740 745 750 Glu Gln Lys Asp Lys Phe Leu LysAsn Arg Gly Phe Ser Leu Leu Ala 755 760 765 Asn Gln Leu Tyr Leu His ArgGly Thr Gln Glu Leu Leu Glu Cys Phe 770 775 780 Val Glu Met Phe Phe GlyArg Pro Ile Gly Leu Asp Glu Glu Phe Asp 785 790 795 800 Leu Glu Glu ValLys His Met Glu Leu Phe Gln Lys Trp Ser Val Ile 805 810 815 Pro Val LeuGly Leu Ile Glu Thr Ser Leu Tyr Asp Asn Val Leu Leu 820 825 830 His AsnAla Leu Leu Leu Leu Leu Gln Val Leu Asn Ser Cys Ser Lys 835 840 845 ValAla Asp Met Leu Leu Asp Asn Gly Leu Leu Tyr Val Leu Cys Asn 850 855 860Thr Val Ala Ala Leu Asn Gly Leu Glu Lys Asn Ile Pro Val Asn Glu 865 870875 880 Tyr Lys Leu Leu Ala Cys Asp Ile Gln Gln Leu Phe Ile Ala Val Thr885 890 895 Ile His Ala Cys Ser Ser Ser Gly Thr Gln Tyr Phe Arg Val IleGlu 900 905 910 Asp Leu Ile Val Leu Leu Gly Tyr Leu His Asn Ser Lys AsnLys Arg 915 920 925 Thr Gln Asn Met Ala Leu Ala Leu Gln Leu Arg Val LeuGln Ala Ala 930 935 940 Leu Glu Phe Ile Arg Ser Thr Ala Asn His Asp SerGlu Ser Pro Val 945 950 955 960 His Ser Pro Ser Ala His Arg His Ser ValPro Pro Lys Arg Arg Ser 965 970 975 Ile Ala Gly Ser Arg Lys Phe Pro LeuAla Gln Thr Glu Ser Leu Leu 980 985 990 Met Lys Met Arg Ser Val Ala SerAsp Glu Leu His Ser Met Met Gln 995 1000 1005 Arg Arg Met Ser Gln GluHis Pro Ser Gln Ala Ser Glu Ala Glu Leu 1010 1015 1020 Ala Gln Arg LeuGln Arg Leu Thr Ile Leu Ala Val Asn Arg Ile Ile 1025 1030 1035 1040 TyrGln Glu Leu Asn Ser Asp Ile Ile Asp Ile Leu Arg Thr Pro Glu 1045 10501055 Asn Thr Ser Gln Ser Lys Thr Ser Val Ser Gln Thr Glu Ile Ser Glu1060 1065 1070 Glu Asp Met His His Glu Gln Pro Ser Val Tyr Asn Pro PheGln Lys 1075 1080 1085 Glu Met Leu Thr Tyr Leu Leu Asp Gly Phe Lys ValCys Ile Gly Ser 1090 1095 1100 Ser Lys Thr Ser Val Ser Lys Gln Gln TrpThr Lys Ile Leu Gly Ser 1105 1110 1115 1120 Cys Lys Glu Thr Leu Arg AspGln Leu Gly Arg Leu Leu Ala His Ile 1125 1130 1135 Leu Ser Pro Thr HisThr Val Gln Glu Arg Lys Gln Ile Leu Glu Ile 1140 1145 1150 Val His GluPro Ala His Gln Asp Ile Leu Arg Asp Cys Leu Ser Pro 1155 1160 1165 SerPro Gln His Gly Ala Lys Leu Val Leu Tyr Leu Ser Glu Leu Ile 1170 11751180 His Asn His Gln Asp Glu Leu Ser Glu Glu Glu Met Asp Thr Ala Glu1185 1190 1195 1200 Leu Leu Met Asn Ala Leu Lys Leu Cys Gly His Lys CysIle Pro Pro 1205 1210 1215 Ser Ala Pro Ser Lys Pro Glu Leu Ile Lys IleIle Arg Glu Glu Gln 1220 1225 1230 Lys Lys Tyr Glu Ser Glu Glu Ser ValSer Lys Gly Ser Trp Gln Lys 1235 1240 1245 Thr Val Asn Asn Asn Gln GlnSer Leu Phe Gln Arg Leu Asp Phe Lys 1250 1255 1260 Ser Lys Asp Ile SerLys Ile Ala Ala Asp Ile Thr Gln Ala Val Ser 1265 1270 1275 1280 Leu SerGln Gly Ile Glu Arg Lys Lys Val Ile Gln His Ile Arg Gly 1285 1290 1295Met Tyr Lys Val Asp Leu Ser Ala Ser Arg His Trp Gln Glu Cys Ile 13001305 1310 Gln Gln Leu Thr His Asp Arg Ala Val Trp Tyr Asp Pro Ile TyrTyr 1315 1320 1325 Pro Thr Ser Trp Gln Leu Asp Pro Thr Glu Gly Pro AsnArg Glu Arg 1330 1335 1340 Arg Arg Leu Gln Arg Cys Tyr Leu Thr Ile ProAsn Lys Tyr Leu Leu 1345 1350 1355 1360 Arg Asp Arg Gln Lys Ser Glu GlyVal Leu Arg Pro Pro Leu Ser Tyr 1365 1370 1375 Leu Phe Glu Asp Lys ThrHis Ser Ser Phe Ser Ser Thr Val Lys Asp 1380 1385 1390 Lys Ala Ala SerGlu Ser Ile Arg Val Asn Arg Arg Cys Ile Ser Val 1395 1400 1405 Ala ProSer Arg Glu Thr Ala Gly Glu Leu Leu Leu Gly Lys Cys Gly 1410 1415 1420Met Tyr Phe Val Glu Asp Asn Ala Ser Asp Ala Val Glu Ser Ser Ser 14251430 1435 1440 Leu Gln Gly Glu Leu Glu Pro Ala Ser Phe Ser Trp Thr TyrGlu Glu 1445 1450 1455 Ile Lys Glu Val His Arg Arg Trp Trp Gln Leu ArgAsp Asn Ala Val 1460 1465 1470 Glu Ile Phe Leu Thr Asn Gly Arg Thr LeuLeu Leu Ala Phe Asp Asn 1475 1480 1485 Asn Lys Val Arg Asp Asp Val TyrGln Ser Ile Leu Thr Asn Asn Leu 1490 1495 1500 Pro Asn Leu Leu Glu TyrGly Asn Ile Thr Ala Leu Thr Asn Leu Trp 1505 1510 1515 1520 Tyr Ser GlyGln Ile Thr Asn Phe Glu Tyr Leu Thr His Leu Asn Lys 1525 1530 1535 HisAla Gly Arg Ser Phe Asn Asp Leu Met Gln Tyr Pro Val Phe Pro 1540 15451550 Phe Ile Leu Ser Asp Tyr Val Ser Glu Thr Leu Asp Leu Asn Asp Pro1555 1560 1565 Ser Ile Tyr Arg Asn Leu Ser Lys Pro Ile Ala Val Gln TyrLys Glu 1570 1575 1580 Lys Glu Asp Arg Tyr Val Asp Thr Tyr Lys Tyr LeuGlu Glu Glu Tyr 1585 1590 1595 1600 Arg Lys Gly Ala Arg Glu Asp Asp ProMet Pro Pro Val Gln Pro Tyr 1605 1610 1615 His Tyr Gly Ser His Tyr SerAsn Ser Gly Thr Val Leu His Phe Leu 1620 1625 1630 Val Arg Met Pro ProPhe Thr Lys Met Phe Leu Ala Tyr Gln Asp Gln 1635 1640 1645 Ser Phe AspIle Pro Asp Arg Thr Phe His Ser Thr Asn Thr Thr Trp 1650 1655 1660 ArgLeu Ser Ser Phe Glu Ser Met Thr Asp Val Lys Glu Leu Ile Pro 1665 16701675 1680 Glu Phe Phe Tyr Leu Pro Glu Phe Leu Val Asn Arg Glu Gly PheAsp 1685 1690 1695 Phe Gly Val Arg Gln Asn Gly Glu Arg Val Asn His ValAsn Leu Pro 1700 1705 1710 Pro Trp Ala Arg Asn Asp Pro Arg Leu Phe IleLeu Ile His Arg Gln 1715 1720 1725 Ala Leu Glu Ser Asp His Val Ser GlnAsn Ile Cys His Trp Ile Asp 1730 1735 1740 Leu Val Phe Gly Tyr Lys GlnLys Gly Lys Ala Ser Val Gln Ala Ile 1745 1750 1755 1760 Asn Val Phe HisPro Ala Thr Tyr Phe Gly Met Asp Val Ser Ala Val 1765 1770 1775 Glu AspPro Val Gln Arg Arg Ala Leu Glu Thr Met Ile Lys Thr Tyr 1780 1785 1790Gly Gln Thr Pro Arg Gln Leu Phe His Thr Ala His Ala Ser Arg Pro 17951800 1805 Gly Ala Lys Leu Asn Ile Glu Gly Glu Leu Pro Ala Ala Val GlyLeu 1810 1815 1820 Leu Val Gln Phe Ala Phe Arg Glu Thr Arg Glu Pro ValLys Glu Val 1825 1830 1835 1840 Thr His Pro Ser Pro Leu Ser Trp Ile LysGly Leu Lys Trp Gly Glu 1845 1850 1855 Tyr Val Gly Ser Pro Ser Ala ProVal Pro Val Val Cys Phe Ser Gln 1860 1865 1870 Pro His Gly Glu Arg PheGly Ser Leu Gln Ala Leu Pro Thr Arg Ala 1875 1880 1885 Ile Cys Gly LeuSer Arg Asn Phe Cys Leu Leu Met Thr Tyr Asn Lys 1890 1895 1900 Glu GlnGly Val Arg Ser Met Asn Asn Thr Asn Ile Gln Trp Ser Ala 1905 1910 19151920 Ile Leu Ser Trp Gly Tyr Ala Asp Asn Ile Leu Arg Leu Lys Ser Lys1925 1930 1935 Gln Ser Glu Pro Pro Ile Asn Phe Ile Gln Ser Ser Gln GlnHis Gln 1940 1945 1950 Val Thr Ser Cys Ala Trp Val Pro Asp Ser Cys GlnLeu Phe Thr Gly 1955 1960 1965 Ser Lys Cys Gly Val Ile Thr Ala Tyr ThrAsn Arg Leu Thr Ser Ser 1970 1975 1980 Thr Pro Ser Glu Ile Glu Met GluSer Gln Met His Leu Tyr Gly His 1985 1990 1995 2000 Thr Glu Glu Ile ThrGly Leu Cys Val Cys Lys Pro Tyr Ser Val Met 2005 2010 2015 Ile Ser ValSer Arg Asp Gly Thr Cys Ile Val Trp Asp Leu Asn Arg 2020 2025 2030 LeuCys Tyr Val Gln Ser Leu Ala Gly His Lys Ser Pro Val Thr Ala 2035 20402045 Val Ser Ala Ser Glu Thr Ser Gly Asp Ile Ala Thr Val Cys Asp Ser2050 2055 2060 Ala Gly Gly Gly Ser Asp Leu Arg Leu Trp Thr Val Asn GlyAsp Leu 2065 2070 2075 2080 Val Gly His Val His Cys Arg Glu Ile Ile CysSer Val Ala Phe Ser 2085 2090 2095 Asn Gln Pro Glu Gly Val Ser Ile AsnVal Ile Ala Gly Gly Leu Glu 2100 2105 2110 Asn Gly Ile Val Arg Leu TrpSer Thr Trp Asp Leu Lys Pro Val Arg 2115 2120 2125 Glu Ile Thr Phe ProLys Ser Asn Lys Pro Ile Ile Ser Leu Thr Phe 2130 2135 2140 Ser Cys AspGly His His Leu Tyr Thr Ala Asn Ser Glu Gly Thr Val 2145 2150 2155 2160Ile Ala Trp Cys Arg Lys Asp Gln Gln Arg Val Lys Leu Pro Met Phe 21652170 2175 Tyr Ser Phe Leu Ser Ser Tyr Ala Ala Gly 2180 2185 11 basepairs nucleic acid single linear DNA 3 TTAAAGTAAG C 11 11 base pairsnucleic acid single linear DNA 4 TTTAGCTGCT G 11 11 base pairs nucleicacid single linear DNA 5 TTAAAGTAAG G 11 11 base pairs nucleic acidsingle linear DNA 6 TGCAGGCTTG T 11 14 base pairs nucleic acid singlelinear DNA 7 TCCAACTGGT AATA 14 13 base pairs nucleic acid single linearDNA 8 GAGTGAGGTA ACA 13 12616 base pairs nucleic acid single linear DNACDS 190..11592 9 GCGGCCGCGT CGACGCGGCG GCGGCAGCGG CGTCGGCTCG GGGTTCTCCGGGAGAGGGGG 60 AGTGCGCGGC GGCCGCAGCT GCCACAAACC AGGTGAAGCT TTGTTCTAAGAATATTTGTT 120 TCATCTAGTT TATGAGTCCA AATGATATAG ACTGTAAATG TCACAGCAGTGGTGAAAGAC 180 TGCTCGGTC ATG AGC ACC GAC AGT AAC TCA CTG GCA CGT GAA TTTCTG 228 Met Ser Thr Asp Ser Asn Ser Leu Ala Arg Glu Phe Leu 1 5 10 ACCGAT GTC AAC CGG CTT TGC AAT GCA GTG GTC CAG AGG GTG GAG GCC 276 Thr AspVal Asn Arg Leu Cys Asn Ala Val Val Gln Arg Val Glu Ala 15 20 25 AGG GAGGAA GAA GAG GAG GAG ACG CAC ATG GCA ACC CTT GGA CAG TAC 324 Arg Glu GluGlu Glu Glu Glu Thr His Met Ala Thr Leu Gly Gln Tyr 30 35 40 45 CTT GTCCAT GGT CGA GGA TTT CTA TTA CTT ACC AAG CTA AAT TCT ATA 372 Leu Val HisGly Arg Gly Phe Leu Leu Leu Thr Lys Leu Asn Ser Ile 50 55 60 ATT GAT CAGGCA TTG ACA TGT AGA GAA GAA CTC CTG ACT CTT CTT CTG 420 Ile Asp Gln AlaLeu Thr Cys Arg Glu Glu Leu Leu Thr Leu Leu Leu 65 70 75 TCT CTC CTT CCACTG GTA TGG AAG ATA CCT GTC CAA GAA GAA AAG GCA 468 Ser Leu Leu Pro LeuVal Trp Lys Ile Pro Val Gln Glu Glu Lys Ala 80 85 90 ACA GAT TTT AAC CTACCG CTC TCA GCA GAT ATA ATC CTG ACC AAA GAA 516 Thr Asp Phe Asn Leu ProLeu Ser Ala Asp Ile Ile Leu Thr Lys Glu 95 100 105 AAG AAC TCA AGT TCACAA AGA TCC ACT CAG GAA AAA TTA CAT TTA GAA 564 Lys Asn Ser Ser Ser GlnArg Ser Thr Gln Glu Lys Leu His Leu Glu 110 115 120 125 GGA AGT GCC CTGTCT AGT CAG GTT TCT GCA AAA GTA AAT GTT TTT CGA 612 Gly Ser Ala Leu SerSer Gln Val Ser Ala Lys Val Asn Val Phe Arg 130 135 140 AAA AGC AGA CGACAG CGT AAA ATT ACC CAT CGC TAT TCT GTA AGA GAT 660 Lys Ser Arg Arg GlnArg Lys Ile Thr His Arg Tyr Ser Val Arg Asp 145 150 155 GCA AGA AAG ACACAG CTC TCC ACC TCA GAT TCA GAA GCC AAT TCA GAT 708 Ala Arg Lys Thr GlnLeu Ser Thr Ser Asp Ser Glu Ala Asn Ser Asp 160 165 170 GAA AAA GGC ATAGCA ATG AAT AAG CAT AGA AGG CCC CAT CTG CTG CAT 756 Glu Lys Gly Ile AlaMet Asn Lys His Arg Arg Pro His Leu Leu His 175 180 185 CAT TTT TTA ACATCG TTT CCT AAA CAA GAC CAC CCC AAA GCT AAA CTT 804 His Phe Leu Thr SerPhe Pro Lys Gln Asp His Pro Lys Ala Lys Leu 190 195 200 205 GAC CGC TTAGCA ACC AAA GAA CAG ACT CCT CCA GAT GCT ATG GCT TTG 852 Asp Arg Leu AlaThr Lys Glu Gln Thr Pro Pro Asp Ala Met Ala Leu 210 215 220 GAA AAT TCCAGA GAG ATT ATT CCA AGA CAG GGG TCA AAC ACT GAC ATT 900 Glu Asn Ser ArgGlu Ile Ile Pro Arg Gln Gly Ser Asn Thr Asp Ile 225 230 235 TTA AGT GAGCCA GCT GCC TTG TCT GTT ATC AGT AAC ATG AAC AAT TCT 948 Leu Ser Glu ProAla Ala Leu Ser Val Ile Ser Asn Met Asn Asn Ser 240 245 250 CCA TTT GACTTA TGT CAT GTT TTG TTA TCT TTA TTA GAA AAA GTT TGT 996 Pro Phe Asp LeuCys His Val Leu Leu Ser Leu Leu Glu Lys Val Cys 255 260 265 AAG TTT GACGTT ACC TTG AAT CAT AAT TCT CCT TTA GCA GCC AGT GTA 1044 Lys Phe Asp ValThr Leu Asn His Asn Ser Pro Leu Ala Ala Ser Val 270 275 280 285 GTG CCCACA CTA ACT GAA TTC CTA GCA GGC TTT GGG GAC TGC TGC AGT 1092 Val Pro ThrLeu Thr Glu Phe Leu Ala Gly Phe Gly Asp Cys Cys Ser 290 295 300 CTG AGCGAC AAC TTG GAG AGT CGA GTA GTT TCT GCA GGT TGG ACC GAA 1140 Leu Ser AspAsn Leu Glu Ser Arg Val Val Ser Ala Gly Trp Thr Glu 305 310 315 GAA CCGGTG GCT TTG ATT CAA AGG ATG CTC TTT CGA ACA GTG TTG CAT 1188 Glu Pro ValAla Leu Ile Gln Arg Met Leu Phe Arg Thr Val Leu His 320 325 330 CTT CTGTCA GTA GAT GTT AGT ACT GCA GAG ATG ATG CCA GAA AAT CTT 1236 Leu Leu SerVal Asp Val Ser Thr Ala Glu Met Met Pro Glu Asn Leu 335 340 345 AGG AAAAAT TTA ACT GAA TTG CTT AGA GCA GCT TTA AAA ATT AGA ATA 1284 Arg Lys AsnLeu Thr Glu Leu Leu Arg Ala Ala Leu Lys Ile Arg Ile 350 355 360 365 TGCCTA GAA AAG CAG CCT GAC CCT TTT GCA CCA AGA CAA AAG AAA ACA 1332 Cys LeuGlu Lys Gln Pro Asp Pro Phe Ala Pro Arg Gln Lys Lys Thr 370 375 380 CTGCAG GAG GTT CAG GAA GAT TTT GTG TTT TCA AAG TAT CGT CAT AGA 1380 Leu GlnGlu Val Gln Glu Asp Phe Val Phe Ser Lys Tyr Arg His Arg 385 390 395 GCCCTT CTT TTA CCT GAG CTT TTG GAA GGA GTT CTT CAG ATT CTG ATC 1428 Ala LeuLeu Leu Pro Glu Leu Leu Glu Gly Val Leu Gln Ile Leu Ile 400 405 410 TGTTGT CTT CAA AGT GCA GCT TCA AAT CCC TTC TAC TTC AGT CAA GCC 1476 Cys CysLeu Gln Ser Ala Ala Ser Asn Pro Phe Tyr Phe Ser Gln Ala 415 420 425 ATGGAT TTG GTT CAA GAA TTC ATT CAG CAT CAT GGA TTT AAT TTA TTT 1524 Met AspLeu Val Gln Glu Phe Ile Gln His His Gly Phe Asn Leu Phe 430 435 440 445GAA ACA GCA GTT CTT CAA ATG GAA TGG CTG GTT TTA AGA GAT GGA GTT 1572 GluThr Ala Val Leu Gln Met Glu Trp Leu Val Leu Arg Asp Gly Val 450 455 460CCT CCC GAG GCC TCA GAG CAT TTG AAA GCC CTA ATA AAT AGT GTG ATG 1620 ProPro Glu Ala Ser Glu His Leu Lys Ala Leu Ile Asn Ser Val Met 465 470 475AAA ATA ATG AGC ACT GTC AAA AAA GTG AAA TCA GAG CAA CTT CAT CAT 1668 LysIle Met Ser Thr Val Lys Lys Val Lys Ser Glu Gln Leu His His 480 485 490TCG ATG TGT ACA AGA AAA AGG CAC AGA CGA TGT GAA TAT TCT CAT TTT 1716 SerMet Cys Thr Arg Lys Arg His Arg Arg Cys Glu Tyr Ser His Phe 495 500 505ATG CAT CAT CAC CGA GAT CTC TCA GGT CTT CTG GTT TCG GCT TTT AAA 1764 MetHis His His Arg Asp Leu Ser Gly Leu Leu Val Ser Ala Phe Lys 510 515 520525 AAC CAG GTT TCC AAA AAC CCA TTT GAA GAG ACT GCA GAT GGA GAT GTT 1812Asn Gln Val Ser Lys Asn Pro Phe Glu Glu Thr Ala Asp Gly Asp Val 530 535540 TAT TAT CCT GAG CGG TGC TGT TGC ATT GCA GTG TGT GCC CAT CAG TGC 1860Tyr Tyr Pro Glu Arg Cys Cys Cys Ile Ala Val Cys Ala His Gln Cys 545 550555 TTG CGC TTA CTA CAG CAG GCT TCC TTG AGC AGC ACT TGT GTC CAG ATC 1908Leu Arg Leu Leu Gln Gln Ala Ser Leu Ser Ser Thr Cys Val Gln Ile 560 565570 CTA TCG GGT GTT CAT AAC ATT GGA ATA TGC TGT TGT ATG GAT CCC AAA 1956Leu Ser Gly Val His Asn Ile Gly Ile Cys Cys Cys Met Asp Pro Lys 575 580585 TCT GTA ATC ATT CCT TTG CTC CAT GCT TTT AAA TTG CCA GCA CTG AAA 2004Ser Val Ile Ile Pro Leu Leu His Ala Phe Lys Leu Pro Ala Leu Lys 590 595600 605 AAT TTT CAG CAG CAT ATA TTG AAT ATC CTT AAC AAA CTT ATT TTG GAT2052 Asn Phe Gln Gln His Ile Leu Asn Ile Leu Asn Lys Leu Ile Leu Asp 610615 620 CAG TTA GGA GGA GCA GAG ATA TCA CCA AAA ATT AAA AAA GCA GCT TGT2100 Gln Leu Gly Gly Ala Glu Ile Ser Pro Lys Ile Lys Lys Ala Ala Cys 625630 635 AAT ATT TGT ACT GTT GAC TCT GAC CAA CTA GCC CAA TTA GAA GAG ACA2148 Asn Ile Cys Thr Val Asp Ser Asp Gln Leu Ala Gln Leu Glu Glu Thr 640645 650 CTG CAG GGA AAC TTA TGT GAT GCT GAA CTC TCC TCA AGT TTA TCC AGT2196 Leu Gln Gly Asn Leu Cys Asp Ala Glu Leu Ser Ser Ser Leu Ser Ser 655660 665 CCT TCT TAC AGA TTT CAA GGG ATC CTG CCC AGC AGT GGA TCT GAA GAT2244 Pro Ser Tyr Arg Phe Gln Gly Ile Leu Pro Ser Ser Gly Ser Glu Asp 670675 680 685 TTG TTG TGG AAA TGG GAT GCT TTA AAG GCT TAT CAG AAC TTT GTTTTT 2292 Leu Leu Trp Lys Trp Asp Ala Leu Lys Ala Tyr Gln Asn Phe Val Phe690 695 700 GAA GAA GAC AGA TTA CAT AGT ATA CAG ATT GCA AAT CAC ATT TGCAAT 2340 Glu Glu Asp Arg Leu His Ser Ile Gln Ile Ala Asn His Ile Cys Asn705 710 715 TTA ATC CAG AAA GGC AAT ATA GTT GTT CAG TGG AAA TTA TAT AATTAC 2388 Leu Ile Gln Lys Gly Asn Ile Val Val Gln Trp Lys Leu Tyr Asn Tyr720 725 730 ATA TTT AAT CCT GTG CTC CAA AGA GGA GTT GAA TTA GCA CAT CATTGT 2436 Ile Phe Asn Pro Val Leu Gln Arg Gly Val Glu Leu Ala His His Cys735 740 745 CAA CAC CTA AGC GTT ACT TCA GCT CAA AGT CAT GTA TGT AGC CATCAT 2484 Gln His Leu Ser Val Thr Ser Ala Gln Ser His Val Cys Ser His His750 755 760 765 AAC CAG TGC TTG CCT CAG GAC GTG CTT CAG ATT TAT GTA AAAACT CTG 2532 Asn Gln Cys Leu Pro Gln Asp Val Leu Gln Ile Tyr Val Lys ThrLeu 770 775 780 CCT ATC CTG CTT AAA TCC AGG GTA ATA AGA GAT TTG TTT TTGAGT TGT 2580 Pro Ile Leu Leu Lys Ser Arg Val Ile Arg Asp Leu Phe Leu SerCys 785 790 795 AAT GGA GTA AGT CAA ATA ATC GAA TTA AAT TGC TTA AAT GGTATT CGA 2628 Asn Gly Val Ser Gln Ile Ile Glu Leu Asn Cys Leu Asn Gly IleArg 800 805 810 AGT CAT TCT CTA AAA GCA TTT GAA ACT CTG ATA ATC AGC CTAGGG GAG 2676 Ser His Ser Leu Lys Ala Phe Glu Thr Leu Ile Ile Ser Leu GlyGlu 815 820 825 CAA CAG AAA GAT GCC TCA GTT CCA GAT ATT GAT GGG ATA GACATT GAA 2724 Gln Gln Lys Asp Ala Ser Val Pro Asp Ile Asp Gly Ile Asp IleGlu 830 835 840 845 CAG AAG GAG TTG TCC TCT GTA CAT GTG GGT ACT TCT TTTCAT CAT CAG 2772 Gln Lys Glu Leu Ser Ser Val His Val Gly Thr Ser Phe HisHis Gln 850 855 860 CAA GCT TAT TCA GAT TCT CCT CAG AGT CTC AGC AAA TTTTAT GCT GGC 2820 Gln Ala Tyr Ser Asp Ser Pro Gln Ser Leu Ser Lys Phe TyrAla Gly 865 870 875 CTC AAA GAA GCT TAT CCA AAG AGA CGG AAG ACT GTT AACCAA GAT GTT 2868 Leu Lys Glu Ala Tyr Pro Lys Arg Arg Lys Thr Val Asn GlnAsp Val 880 885 890 CAT ATC AAC ACA ATA AAC CTA TTC CTC TGT GTG GCT TTTTTA TGC GTA 2916 His Ile Asn Thr Ile Asn Leu Phe Leu Cys Val Ala Phe LeuCys Val 895 900 905 AGT AAA GAA GCA GAG TCT GAC AGG GAG TCG GCC AAT GACTCA GAA GAT 2964 Ser Lys Glu Ala Glu Ser Asp Arg Glu Ser Ala Asn Asp SerGlu Asp 910 915 920 925 ACT TCT GGC TAT GAC AGC ACA GCC AGC GAG CCT TTAAGT CAT ATG CTG 3012 Thr Ser Gly Tyr Asp Ser Thr Ala Ser Glu Pro Leu SerHis Met Leu 930 935 940 CCA TGT ATA TCT CTC GAG AGC CTT GTC TTG CCT TCTCCT GAA CAT ATG 3060 Pro Cys Ile Ser Leu Glu Ser Leu Val Leu Pro Ser ProGlu His Met 945 950 955 CAC CAA GCA GCA GAC ATT TGG TCT ATG TGT CGT TGGATC TAC ATG TTG 3108 His Gln Ala Ala Asp Ile Trp Ser Met Cys Arg Trp IleTyr Met Leu 960 965 970 AGT TCA GTG TTC CAG AAA CAG TTT TAT AGG CTT GGTGGT TTC CGA GTA 3156 Ser Ser Val Phe Gln Lys Gln Phe Tyr Arg Leu Gly GlyPhe Arg Val 975 980 985 TGC CAT AAG TTA ATA TTT ATG ATA ATA CAG AAA CTGTTC AGA AGT CAC 3204 Cys His Lys Leu Ile Phe Met Ile Ile Gln Lys Leu PheArg Ser His 990 995 1000 1005 AAA GAG GAG CAA GGA AAA AAG GAG GGA GATACA AGT GTA AAT GAA AAC 3252 Lys Glu Glu Gln Gly Lys Lys Glu Gly Asp ThrSer Val Asn Glu Asn 1010 1015 1020 CAG GAT TTA AAC AGA ATT TCT CAA CCTAAG AGA ACT ATG AAG GAA GAT 3300 Gln Asp Leu Asn Arg Ile Ser Gln Pro LysArg Thr Met Lys Glu Asp 1025 1030 1035 TTA TTA TCT TTG GCT ATA AAA AGTGAC CCC ATA CCA TCA GAA CTA GGT 3348 Leu Leu Ser Leu Ala Ile Lys Ser AspPro Ile Pro Ser Glu Leu Gly 1040 1045 1050 AGT CTA AAA AAG AGT GCT GACAGT TTA GGT AAA TTA GAG TTA CAG CAT 3396 Ser Leu Lys Lys Ser Ala Asp SerLeu Gly Lys Leu Glu Leu Gln His 1055 1060 1065 ATT TCT TCC ATA AAT GTGGAA GAA GTT TCA GCT ACT GAA GCC GCT CCC 3444 Ile Ser Ser Ile Asn Val GluGlu Val Ser Ala Thr Glu Ala Ala Pro 1070 1075 1080 1085 GAG GAA GCA AAGCTA TTT ACA AGT CAA GAA AGT GAG ACC TCA CTT CAA 3492 Glu Glu Ala Lys LeuPhe Thr Ser Gln Glu Ser Glu Thr Ser Leu Gln 1090 1095 1100 AGT ATA CGACTT TTG GAA GCC CTT CTG GCC ATT TGT CTT CAT GGT GCC 3540 Ser Ile Arg LeuLeu Glu Ala Leu Leu Ala Ile Cys Leu His Gly Ala 1105 1110 1115 AGA ACTAGT CAA CAG AAG ATG GAA TTG GAG TTA CCT AAT CAG AAC TTG 3588 Arg Thr SerGln Gln Lys Met Glu Leu Glu Leu Pro Asn Gln Asn Leu 1120 1125 1130 TCTGTG GAA AGT ATA TTA TTT GAA ATG AGG GAC CAT CTT TCC CAG TCA 3636 Ser ValGlu Ser Ile Leu Phe Glu Met Arg Asp His Leu Ser Gln Ser 1135 1140 1145AAG GTG ATT GAA ACA CAA CTA GCA AAG CCG TTA TTT GAT GCC CTG CTT 3684 LysVal Ile Glu Thr Gln Leu Ala Lys Pro Leu Phe Asp Ala Leu Leu 1150 11551160 1165 CGA GTT GCC CTC GGG AAT TAT TCA GCA GAT TTT GAA CAT AAT GATGCT 3732 Arg Val Ala Leu Gly Asn Tyr Ser Ala Asp Phe Glu His Asn Asp Ala1170 1175 1180 ATG ACT GAG AAG AGT CAT CAA TCT GCA GAA GAA TTG TCA TCCCAG CCT 3780 Met Thr Glu Lys Ser His Gln Ser Ala Glu Glu Leu Ser Ser GlnPro 1185 1190 1195 GGT GAT TTT TCA GAA GAA GCT GAG GAT TCT CAG TGT TGTAGT TTT AAA 3828 Gly Asp Phe Ser Glu Glu Ala Glu Asp Ser Gln Cys Cys SerPhe Lys 1200 1205 1210 CTT TTA GTT GAA GAA GAA GGT TAC GAA GCA GAT AGTGAA AGC AAT CCT 3876 Leu Leu Val Glu Glu Glu Gly Tyr Glu Ala Asp Ser GluSer Asn Pro 1215 1220 1225 GAA GAT GGC GAA ACC CAG GAT GAT GGG GTA GACTTA AAG TCT GAA ACA 3924 Glu Asp Gly Glu Thr Gln Asp Asp Gly Val Asp LeuLys Ser Glu Thr 1230 1235 1240 1245 GAA GGT TTC AGT GCA TCA AGC AGT CCAAAT GAC TTA CTC GAA AAC CTC 3972 Glu Gly Phe Ser Ala Ser Ser Ser Pro AsnAsp Leu Leu Glu Asn Leu 1250 1255 1260 ACT CAA GGG GAA ATA ATT TAT CCTGAG ATT TGT ATG CTG GAA TTA AAT 4020 Thr Gln Gly Glu Ile Ile Tyr Pro GluIle Cys Met Leu Glu Leu Asn 1265 1270 1275 TTG CTT TCT GCT AGT AAA GCCAAA CTT GAT GTG CTT GCC CAT GTA TTT 4068 Leu Leu Ser Ala Ser Lys Ala LysLeu Asp Val Leu Ala His Val Phe 1280 1285 1290 GAG AGT TTT TTG AAA ATTATT AGG CAG AAA GAA AAG AAT GTT TTT CTG 4116 Glu Ser Phe Leu Lys Ile IleArg Gln Lys Glu Lys Asn Val Phe Leu 1295 1300 1305 CTC ATG CAA CAG GGAACT GTG AAA AAT CTT TTA GGA GGG TTC TTG AGT 4164 Leu Met Gln Gln Gly ThrVal Lys Asn Leu Leu Gly Gly Phe Leu Ser 1310 1315 1320 1325 ATT TTA ACACAG GAT GAT TCT GAT TTT CAA GCA TGC CAG AGA GTA TTG 4212 Ile Leu Thr GlnAsp Asp Ser Asp Phe Gln Ala Cys Gln Arg Val Leu 1330 1335 1340 GTG GATCTT TTG GTA TCT TTG ATG AGT TCA AGA ACA TGT TCA GAA GAG 4260 Val Asp LeuLeu Val Ser Leu Met Ser Ser Arg Thr Cys Ser Glu Glu 1345 1350 1355 CTAACC CTT CTT TTG AGA ATA TTT CTG GAG AAA TCT CCT TGT ACA AAA 4308 Leu ThrLeu Leu Leu Arg Ile Phe Leu Glu Lys Ser Pro Cys Thr Lys 1360 1365 1370ATT CTT CTT CTG GGT ATT CTG AAA ATT ATT GAA AGT GAT ACT ACT ATG 4356 IleLeu Leu Leu Gly Ile Leu Lys Ile Ile Glu Ser Asp Thr Thr Met 1375 13801385 AGC CCT TCA CAG TAT CTA ACC TTC CCT TTA CTG CAC GCT CCA AAT TTA4404 Ser Pro Ser Gln Tyr Leu Thr Phe Pro Leu Leu His Ala Pro Asn Leu1390 1395 1400 1405 AGC AAC GGT GTT TCA TCA CAA AAG TAT CCT GGG ATT TTAAAC AGT AAG 4452 Ser Asn Gly Val Ser Ser Gln Lys Tyr Pro Gly Ile Leu AsnSer Lys 1410 1415 1420 GCC ATG GGT TTA TTG AGA AGA GCA CGA GTT TCA CGGAGC AAG AAA GAG 4500 Ala Met Gly Leu Leu Arg Arg Ala Arg Val Ser Arg SerLys Lys Glu 1425 1430 1435 GCT GAT AGA GAG AGT TTT CCC CAT CGG CTG CTTTCA TCT TGG CAC ATA 4548 Ala Asp Arg Glu Ser Phe Pro His Arg Leu Leu SerSer Trp His Ile 1440 1445 1450 GCC CCA GTC CAC CTG CCG TTG CTG GGG CAAAAC TGC TGG CCA CAC CTA 4596 Ala Pro Val His Leu Pro Leu Leu Gly Gln AsnCys Trp Pro His Leu 1455 1460 1465 TCA GAA GGT TTC AGT GTT TCC CTG TGGTTT AAT GTG GAG TGT ATC CAT 4644 Ser Glu Gly Phe Ser Val Ser Leu Trp PheAsn Val Glu Cys Ile His 1470 1475 1480 1485 GAA GCT GAG AGT ACT ACA GAAAAA GGA AAG AAG ATA AAG AAA AGA AAC 4692 Glu Ala Glu Ser Thr Thr Glu LysGly Lys Lys Ile Lys Lys Arg Asn 1490 1495 1500 AAA TCA TTA ATT TTA CCAGAT AGC AGT TTT GAT GGT ACA GAG AGC GAC 4740 Lys Ser Leu Ile Leu Pro AspSer Ser Phe Asp Gly Thr Glu Ser Asp 1505 1510 1515 AGA CCA GAA GGT GCAGAG TAC ATA AAT CCT GGT GAA AGA CTC ATA GAA 4788 Arg Pro Glu Gly Ala GluTyr Ile Asn Pro Gly Glu Arg Leu Ile Glu 1520 1525 1530 GAA GGA TGT ATTCAT ATA ATT TCA CTG GGA TCC AAA GCG TTG ATG ATC 4836 Glu Gly Cys Ile HisIle Ile Ser Leu Gly Ser Lys Ala Leu Met Ile 1535 1540 1545 CAA GTG TGGGCT GAT CCC CAC AAT GCC ACT CTT ATC TTT CGT GTG TGC 4884 Gln Val Trp AlaAsp Pro His Asn Ala Thr Leu Ile Phe Arg Val Cys 1550 1555 1560 1565 ATGGAT TCA AAT GAT GAC ATG AAA GCT GTT TTA CTA GCA CAG GTT GAA 4932 Met AspSer Asn Asp Asp Met Lys Ala Val Leu Leu Ala Gln Val Glu 1570 1575 1580TCA CAG GAG AAT ATT TTC CTC CCA AGC AAA TGG CAA CAT TTA GTA CTC 4980 SerGln Glu Asn Ile Phe Leu Pro Ser Lys Trp Gln His Leu Val Leu 1585 15901595 ACC TAC TTA CAG CAG CCC CAA GGG AAA AGG AGG ATT CAT GGG AAA ATC5028 Thr Tyr Leu Gln Gln Pro Gln Gly Lys Arg Arg Ile His Gly Lys Ile1600 1605 1610 TCC ATA TGG GTC TCT GGA CAG AGG AAG CCT GAT GTT ACT TTGGAT TTT 5076 Ser Ile Trp Val Ser Gly Gln Arg Lys Pro Asp Val Thr Leu AspPhe 1615 1620 1625 ATG CTT CCA AGA AAA ACA AGT TTG TCA TCT GAT AGC AATAAA ACA TTT 5124 Met Leu Pro Arg Lys Thr Ser Leu Ser Ser Asp Ser Asn LysThr Phe 1630 1635 1640 1645 TGC ATG ATT GGC CAT TGT TTA TCA TCC CAA GAAGAG TTT TTG CAG TTG 5172 Cys Met Ile Gly His Cys Leu Ser Ser Gln Glu GluPhe Leu Gln Leu 1650 1655 1660 GCT GGA AAA TGG GAC CTG GGA AAT TTG CTTCTC TTC AAC GGA GCT AAG 5220 Ala Gly Lys Trp Asp Leu Gly Asn Leu Leu LeuPhe Asn Gly Ala Lys 1665 1670 1675 GTT GGT TCA CAA GAG GCC TTT TAT CTGTAT GCT TGT GGA CCC AAC CAT 5268 Val Gly Ser Gln Glu Ala Phe Tyr Leu TyrAla Cys Gly Pro Asn His 1680 1685 1690 ACA TCT GTA ATG CCA TGT AAG TATGGC AAG CCA GTC AAT GAC TAC TCC 5316 Thr Ser Val Met Pro Cys Lys Tyr GlyLys Pro Val Asn Asp Tyr Ser 1695 1700 1705 AAA TAT ATT AAT AAA GAA ATTTTG CGA TGT GAA CAA ATC AGA GAA CTT 5364 Lys Tyr Ile Asn Lys Glu Ile LeuArg Cys Glu Gln Ile Arg Glu Leu 1710 1715 1720 1725 TTT ATG ACC AAG AAAGAT GTG GAT ATT GGT CTC TTA ATT GAA AGT CTT 5412 Phe Met Thr Lys Lys AspVal Asp Ile Gly Leu Leu Ile Glu Ser Leu 1730 1735 1740 TCA GTT GTT TATACA ACT TAC TGT CCT GCT CAG TAT ACC ATC TAT GAA 5460 Ser Val Val Tyr ThrThr Tyr Cys Pro Ala Gln Tyr Thr Ile Tyr Glu 1745 1750 1755 CCA GTG ATTAGA CTT AAA GGT CAA ATG AAA ACC CAA CTC TCT CAA AGA 5508 Pro Val Ile ArgLeu Lys Gly Gln Met Lys Thr Gln Leu Ser Gln Arg 1760 1765 1770 CCC TTCAGC TCA AAA GAA GTT CAG AGC ATC TTA TTA GAA CCT CAT CAT 5556 Pro Phe SerSer Lys Glu Val Gln Ser Ile Leu Leu Glu Pro His His 1775 1780 1785 CTAAAA AAT CTC CAA CCT ACT GAA TAT AAA ACT ATT CAA GGC ATT CTG 5604 Leu LysAsn Leu Gln Pro Thr Glu Tyr Lys Thr Ile Gln Gly Ile Leu 1790 1795 18001805 CAC GAA ATT GGT GGA ACT GGC ATA TTT GTT TTT CTC TTT GCC AGG GTT5652 His Glu Ile Gly Gly Thr Gly Ile Phe Val Phe Leu Phe Ala Arg Val1810 1815 1820 GTT GAA CTC AGT AGC TGT GAA GAA ACT CAA GCA TTA GCA CTGCGA GTT 5700 Val Glu Leu Ser Ser Cys Glu Glu Thr Gln Ala Leu Ala Leu ArgVal 1825 1830 1835 ATA CTC TCA TTA ATT AAA TAC AAC CAA CAA AGA GTA CATGAA TTA GAA 5748 Ile Leu Ser Leu Ile Lys Tyr Asn Gln Gln Arg Val His GluLeu Glu 1840 1845 1850 AAT TGT AAT GGA CTT TCT ATG ATT CAT CAG GTG TTGATC AAA CAA AAA 5796 Asn Cys Asn Gly Leu Ser Met Ile His Gln Val Leu IleLys Gln Lys 1855 1860 1865 TGC ATT GTT GGG TTT TAC ATT TTG AAG ACC CTTCTT GAA GGA TGC TGT 5844 Cys Ile Val Gly Phe Tyr Ile Leu Lys Thr Leu LeuGlu Gly Cys Cys 1870 1875 1880 1885 GGT GAA GAT ATT ATT TAT ATG AAT GAGAAT GGA GAG TTT AAG TTG GAT 5892 Gly Glu Asp Ile Ile Tyr Met Asn Glu AsnGly Glu Phe Lys Leu Asp 1890 1895 1900 GTA GAC TCT AAT GCT ATA ATC CAAGAT GTT AAG CTG TTA GAG GAA CTA 5940 Val Asp Ser Asn Ala Ile Ile Gln AspVal Lys Leu Leu Glu Glu Leu 1905 1910 1915 TTG CTT GAC TGG AAG ATA TGGAGT AAA GCA GAG CAA GGT GTT TGG GAA 5988 Leu Leu Asp Trp Lys Ile Trp SerLys Ala Glu Gln Gly Val Trp Glu 1920 1925 1930 ACT TTG CTA GCA GCT CTAGAA GTC CTC ATC AGA GCA GAT CAC CAC CAG 6036 Thr Leu Leu Ala Ala Leu GluVal Leu Ile Arg Ala Asp His His Gln 1935 1940 1945 CAG ATG TTT AAT ATTAAG CAG TTA TTG AAA GCT CAA GTG GTT CAT CAC 6084 Gln Met Phe Asn Ile LysGln Leu Leu Lys Ala Gln Val Val His His 1950 1955 1960 1965 TTT CTA CTGACT TGT CAG GTT TTG CAG GAA TAC AAA GAG GGG CAA CTC 6132 Phe Leu Leu ThrCys Gln Val Leu Gln Glu Tyr Lys Glu Gly Gln Leu 1970 1975 1980 ACA CCCATG CCC CGA GAG GTT TGT AGA TCA TTT GTG AAA ATT ATA GCA 6180 Thr Pro MetPro Arg Glu Val Cys Arg Ser Phe Val Lys Ile Ile Ala 1985 1990 1995 GAAGTC CTT GGA TCT CCT CCA GAT TTG GAA TTA TTG ACA ATT ATC TTC 6228 Glu ValLeu Gly Ser Pro Pro Asp Leu Glu Leu Leu Thr Ile Ile Phe 2000 2005 2010AAT TTC CTT TTA GCA GTT CAC CCT CCT ACT AAT ACT TAC GTT TGT CAC 6276 AsnPhe Leu Leu Ala Val His Pro Pro Thr Asn Thr Tyr Val Cys His 2015 20202025 AAT CCC ACG AAC TTC TAC TTT TCT TTG CAC ATA GAT GGC AAG ATC TTT6324 Asn Pro Thr Asn Phe Tyr Phe Ser Leu His Ile Asp Gly Lys Ile Phe2030 2035 2040 2045 CAG GAG AAA GTG CGG TCA ATC ATG TAC CTG AGG CAT TCCAGC AGT GGA 6372 Gln Glu Lys Val Arg Ser Ile Met Tyr Leu Arg His Ser SerSer Gly 2050 2055 2060 GGA AGG TCC CTT ATG AGC CCT GGA TTT ATG GTA ATAAGC CCA TCT GGT 6420 Gly Arg Ser Leu Met Ser Pro Gly Phe Met Val Ile SerPro Ser Gly 2065 2070 2075 TTT ACT GCT TCA CCA TAT GAA GGA GAG AAT TCCTCT AAT ATT ATT CCA 6468 Phe Thr Ala Ser Pro Tyr Glu Gly Glu Asn Ser SerAsn Ile Ile Pro 2080 2085 2090 CAA CAG ATG GCC GCC CAT ATG CTG CGT TCTAGA AGC CTA CCA GCA TTC 6516 Gln Gln Met Ala Ala His Met Leu Arg Ser ArgSer Leu Pro Ala Phe 2095 2100 2105 CCT ACT TCT TCA CTA CTA ACG CAA TCACAA AAA CTG ACT GGA AGT TTG 6564 Pro Thr Ser Ser Leu Leu Thr Gln Ser GlnLys Leu Thr Gly Ser Leu 2110 2115 2120 2125 GGT TGT AGT ATC GAC AGG TTACAA AAT ATT GCA GAT ACT TAT GTT GCC 6612 Gly Cys Ser Ile Asp Arg Leu GlnAsn Ile Ala Asp Thr Tyr Val Ala 2130 2135 2140 ACC CAA TCA AAG AAA CAAAAT TCT TTG GGG AGT TCC GAC ACA CTG AAA 6660 Thr Gln Ser Lys Lys Gln AsnSer Leu Gly Ser Ser Asp Thr Leu Lys 2145 2150 2155 AAA GGC AAA GAG GACGCA TTC ATC AGT AGC TGT GAG TCT GCA AAA ACT 6708 Lys Gly Lys Glu Asp AlaPhe Ile Ser Ser Cys Glu Ser Ala Lys Thr 2160 2165 2170 GTT TGT GAA ATGGAA GCT GTC CTC TCA GCC CAG GTC TCT GTC AGT GAT 6756 Val Cys Glu Met GluAla Val Leu Ser Ala Gln Val Ser Val Ser Asp 2175 2180 2185 GTC CCA AAGGGA GTG CTG GGA TTT CCA GTG GTC AAA GCA GAT CAT AAA 6804 Val Pro Lys GlyVal Leu Gly Phe Pro Val Val Lys Ala Asp His Lys 2190 2195 2200 2205 CAGTTG GGA GCA GAA CCC AGG TCA GAA GAT GAC AGT CCT GGG GAT GAG 6852 Gln LeuGly Ala Glu Pro Arg Ser Glu Asp Asp Ser Pro Gly Asp Glu 2210 2215 2220TCC TGC CCA CGC CGA CCT GAT TAC CTA AAG GGA TTG GCC TCC TTC CAG 6900 SerCys Pro Arg Arg Pro Asp Tyr Leu Lys Gly Leu Ala Ser Phe Gln 2225 22302235 CGA AGC CAC AGC ACT ATT GCA AGC CTT GGG CTA GCT TTT CCT TCA CAG6948 Arg Ser His Ser Thr Ile Ala Ser Leu Gly Leu Ala Phe Pro Ser Gln2240 2245 2250 AAC GGA TCT GCA GCT GTT GGC CGT TGG CCA AGT CTT GTT GATAGA AAC 6996 Asn Gly Ser Ala Ala Val Gly Arg Trp Pro Ser Leu Val Asp ArgAsn 2255 2260 2265 ACT GAT GAT TGG GAA AAC TTT GCC TAT TCT CTT GGT TATGAG CCA AAT 7044 Thr Asp Asp Trp Glu Asn Phe Ala Tyr Ser Leu Gly Tyr GluPro Asn 2270 2275 2280 2285 TAC AAC CGA ACT GCA AGT GCT CAC AGT GTA ACTGAA GAC TGT TTG GTA 7092 Tyr Asn Arg Thr Ala Ser Ala His Ser Val Thr GluAsp Cys Leu Val 2290 2295 2300 CCT ATA TGC TGT GGA TTA TAT GAA CTC CTAAGT GGG GTT CTT CTT ATC 7140 Pro Ile Cys Cys Gly Leu Tyr Glu Leu Leu SerGly Val Leu Leu Ile 2305 2310 2315 CTG CCT GAT GTT TTG CTT GAA GAT GTGATG GAC AAG CTT ATT CAA GCA 7188 Leu Pro Asp Val Leu Leu Glu Asp Val MetAsp Lys Leu Ile Gln Ala 2320 2325 2330 GAT ACA CTT TTG GTC CTC GTT AACCAC CCA TCA CCA GCT ATA CAA CAA 7236 Asp Thr Leu Leu Val Leu Val Asn HisPro Ser Pro Ala Ile Gln Gln 2335 2340 2345 GGT GTT ATT AAA CTA TTA GATGCA TAT TTT GCT AGA GCA TCT AAG GAA 7284 Gly Val Ile Lys Leu Leu Asp AlaTyr Phe Ala Arg Ala Ser Lys Glu 2350 2355 2360 2365 CAA AAA GAT AAA TTTCTG AAG AAT CGT GGA TTT TCC TTG CTA GCC AAC 7332 Gln Lys Asp Lys Phe LeuLys Asn Arg Gly Phe Ser Leu Leu Ala Asn 2370 2375 2380 CAG TTG TAT CTTCAT CGA GGA ACT CAA GAA TTG TTA GAA TGC TTC ATC 7380 Gln Leu Tyr Leu HisArg Gly Thr Gln Glu Leu Leu Glu Cys Phe Ile 2385 2390 2395 GAA ATG TTCTTT GGT CGA CAT ATT GGC CTT GAT GAA GAA TTT GAT CTG 7428 Glu Met Phe PheGly Arg His Ile Gly Leu Asp Glu Glu Phe Asp Leu 2400 2405 2410 GAA GATGTG AGA AAC ATG GGA TTG TTT CAG AAG TGG TCT GTC ATT CCT 7476 Glu Asp ValArg Asn Met Gly Leu Phe Gln Lys Trp Ser Val Ile Pro 2415 2420 2425 ATTCTG GGA CTA ATA GAG ACC TCT CTA TAT GAC AAC ATA CTC TTG CAT 7524 Ile LeuGly Leu Ile Glu Thr Ser Leu Tyr Asp Asn Ile Leu Leu His 2430 2435 24402445 AAT GCT CTT TTA CTT CTT CTC CAA ATT TTA AAT TCT TGT TCT AAG GTA7572 Asn Ala Leu Leu Leu Leu Leu Gln Ile Leu Asn Ser Cys Ser Lys Val2450 2455 2460 GCA GAT ATG TTG CTG GAT AAT GGT CTA CTC TAT GTG TTA TGTAAT ACA 7620 Ala Asp Met Leu Leu Asp Asn Gly Leu Leu Tyr Val Leu Cys AsnThr 2465 2470 2475 GTA GCA GCC CTG AAT GGA TTA GAA AAG AAC ATT CCC ATGAGT GAA TAT 7668 Val Ala Ala Leu Asn Gly Leu Glu Lys Asn Ile Pro Met SerGlu Tyr 2480 2485 2490 AAA TTG CTT GCT TGT GAT ATA CAG CAA CTT TTC ATAGCA GTT ACA ATT 7716 Lys Leu Leu Ala Cys Asp Ile Gln Gln Leu Phe Ile AlaVal Thr Ile 2495 2500 2505 CAT GCT TGC AGT TCC TCA GGC TCA CAA TAT TTTAGG GTT ATT GAA GAC 7764 His Ala Cys Ser Ser Ser Gly Ser Gln Tyr Phe ArgVal Ile Glu Asp 2510 2515 2520 2525 CTT ATT GTA ATG CTT GGA TAT CTT CAAAAT AGC AAA AAC AAG AGG ACA 7812 Leu Ile Val Met Leu Gly Tyr Leu Gln AsnSer Lys Asn Lys Arg Thr 2530 2535 2540 CAA AAT ATG GCT GTT GCA CTA CAGCTT AGA GTT CTC CAG GCT GCT ATG 7860 Gln Asn Met Ala Val Ala Leu Gln LeuArg Val Leu Gln Ala Ala Met 2545 2550 2555 GAA TTT ATA AGG ACC ACC GCAAAT CAT GAC TCT GAA AAC CTC ACA GAT 7908 Glu Phe Ile Arg Thr Thr Ala AsnHis Asp Ser Glu Asn Leu Thr Asp 2560 2565 2570 TCA CTC CAG TCA CCT TCTGCT CCC CAT CAT GCA GTA GTT CAA AAG CGG 7956 Ser Leu Gln Ser Pro Ser AlaPro His His Ala Val Val Gln Lys Arg 2575 2580 2585 AAA AGC ATT GCT GGTCCT CGA AAA TTT CCC CTT GCT CAA ACT GAA TCG 8004 Lys Ser Ile Ala Gly ProArg Lys Phe Pro Leu Ala Gln Thr Glu Ser 2590 2595 2600 2605 CTT CTG ATGAAA ATG CGT TCA GTG GCA AAT GAT GAG CTT CAT GTG ATG 8052 Leu Leu Met LysMet Arg Ser Val Ala Asn Asp Glu Leu His Val Met 2610 2615 2620 ATG CAACGG AGA ATG AGC CAA GAG AAC CCT AGC CAA GCA ACT GAA ACG 8100 Met Gln ArgArg Met Ser Gln Glu Asn Pro Ser Gln Ala Thr Glu Thr 2625 2630 2635 GAACTT GCG CAG AGA CTA CAG AGG CTC ACT GTT TTA GCA GTC AAC AGG 8148 Glu LeuAla Gln Arg Leu Gln Arg Leu Thr Val Leu Ala Val Asn Arg 2640 2645 2650ATT ATT TAT CAA GAA TTT AAT TCA GAC ATT ATT GAC ATT TTG AGA ACT 8196 IleIle Tyr Gln Glu Phe Asn Ser Asp Ile Ile Asp Ile Leu Arg Thr 2655 26602665 CCA GAA AAT GTA ACT CAA AGC AAG ACC TCA GTT TTC CAG ACC GAA ATT8244 Pro Glu Asn Val Thr Gln Ser Lys Thr Ser Val Phe Gln Thr Glu Ile2670 2675 2680 2685 TCT GAG GAA AAT ATT CAT CAT GAA CAG TCT TCT GTT TTCAAT CCA TTT 8292 Ser Glu Glu Asn Ile His His Glu Gln Ser Ser Val Phe AsnPro Phe 2690 2695 2700 CAG AAA GAA ATT TTT ACA TAT CTG GTA GAA GGA TTCAAA GTA TCT ATT 8340 Gln Lys Glu Ile Phe Thr Tyr Leu Val Glu Gly Phe LysVal Ser Ile 2705 2710 2715 GGT TCA AGT AAA GCC AGT GGT TCC AAG CAG CAATGG ACT AAA ATT CTG 8388 Gly Ser Ser Lys Ala Ser Gly Ser Lys Gln Gln TrpThr Lys Ile Leu 2720 2725 2730 TGG TCT TGT AAG GAG ACC TTC CGA ATG CAGCTT GGG AGA CTA CTA GTG 8436 Trp Ser Cys Lys Glu Thr Phe Arg Met Gln LeuGly Arg Leu Leu Val 2735 2740 2745 CAT ATT TTG TCG CCA GCC CAC GCT GCACAA GAG AGA AAG CAA ATT TTT 8484 His Ile Leu Ser Pro Ala His Ala Ala GlnGlu Arg Lys Gln Ile Phe 2750 2755 2760 2765 GAA ATA GTT CAT GAA CCA AATCAT CAG GAA ATA CTA CGA GAC TGT CTC 8532 Glu Ile Val His Glu Pro Asn HisGln Glu Ile Leu Arg Asp Cys Leu 2770 2775 2780 AGC CCA TCC CTA CAA CATGGA GCC AAG TTA GTT TTG TAT TTG TCA GAG 8580 Ser Pro Ser Leu Gln His GlyAla Lys Leu Val Leu Tyr Leu Ser Glu 2785 2790 2795 TTG ATA CAT AAT CACCAA GGT GAA TTG ACT GAA GAA GAG CTA GGC ACA 8628 Leu Ile His Asn His GlnGly Glu Leu Thr Glu Glu Glu Leu Gly Thr 2800 2805 2810 GCA GAA CTG CTTATG AAT GCT TTG AAG TTA TGT GGT CAC AAG TGC ATC 8676 Ala Glu Leu Leu MetAsn Ala Leu Lys Leu Cys Gly His Lys Cys Ile 2815 2820 2825 CCT CCC AGTGCA TCA ACA AAA GCA GAC CTT ATT AAA ATG ATC AAA GAG 8724 Pro Pro Ser AlaSer Thr Lys Ala Asp Leu Ile Lys Met Ile Lys Glu 2830 2835 2840 2845 GAACAA AAG AAA TAT GAA ACT GAA GAA GGA GTG AAT AAA GCT GCT TGG 8772 Glu GlnLys Lys Tyr Glu Thr Glu Glu Gly Val Asn Lys Ala Ala Trp 2850 2855 2860CAG AAA ACA GTT AAC AAT AAT CAA CAA AGT CTC TTT CAG CGT CTG GAT 8820 GlnLys Thr Val Asn Asn Asn Gln Gln Ser Leu Phe Gln Arg Leu Asp 2865 28702875 TCA AAA TCA AAG GAT ATA TCT AAA ATA GCT GCA GAT ATC ACC CAG GCA8868 Ser Lys Ser Lys Asp Ile Ser Lys Ile Ala Ala Asp Ile Thr Gln Ala2880 2885 2890 GTG TCT CTC TCC CAA GGA AAT GAG AGA AAA AAG GTG ATC CAGCAT ATT 8916 Val Ser Leu Ser Gln Gly Asn Glu Arg Lys Lys Val Ile Gln HisIle 2895 2900 2905 AGA GGA ATG TAT AAA GTA GAT TTG AGT GCC AGC AGA CATTGG CAG GAA 8964 Arg Gly Met Tyr Lys Val Asp Leu Ser Ala Ser Arg His TrpGln Glu 2910 2915 2920 2925 CTT ATT CAG CAG CTG ACA CAT GAT AGA GCA GTATGG TAT GAC CCC ATC 9012 Leu Ile Gln Gln Leu Thr His Asp Arg Ala Val TrpTyr Asp Pro Ile 2930 2935 2940 TAC TAT CCA ACC TCA TGG CAG TTG GAT CCAACA GAA GGG CCA AAT CGA 9060 Tyr Tyr Pro Thr Ser Trp Gln Leu Asp Pro ThrGlu Gly Pro Asn Arg 2945 2950 2955 GAG AGG AGA CGT TTA CAG AGA TGT TATTTA ACT ATT CCA AAT AAG TAT 9108 Glu Arg Arg Arg Leu Gln Arg Cys Tyr LeuThr Ile Pro Asn Lys Tyr 2960 2965 2970 CTC CTT AGG GAT AGA CAG AAA TCAGAA GAT GTT GTC AAA CCA CCA CTC 9156 Leu Leu Arg Asp Arg Gln Lys Ser GluAsp Val Val Lys Pro Pro Leu 2975 2980 2985 TCT TAC CTG TTT GAA GAC AAAACT CAT TCT TCT TTC TCT TCT ACT GTC 9204 Ser Tyr Leu Phe Glu Asp Lys ThrHis Ser Ser Phe Ser Ser Thr Val 2990 2995 3000 3005 AAA GAC AAA GCT GCAAGT GAA TCT ATA AGA GTG AAT CGA AGA TGC ATC 9252 Lys Asp Lys Ala Ala SerGlu Ser Ile Arg Val Asn Arg Arg Cys Ile 3010 3015 3020 AGT GTT GCA CCATCT AGA GAG ACA GCT GGT GAA TTG TTA CTA GGT AAA 9300 Ser Val Ala Pro SerArg Glu Thr Ala Gly Glu Leu Leu Leu Gly Lys 3025 3030 3035 TGT GGA ATGTAT TTT GTG GAA GAT AAT GCT TCT GAT ACA GTT GAA AGT 9348 Cys Gly Met TyrPhe Val Glu Asp Asn Ala Ser Asp Thr Val Glu Ser 3040 3045 3050 TCG AGCCTT CAG GGA GAG TTG GAA CCA GCA TCA TTT TCC TGG ACA TAT 9396 Ser Ser LeuGln Gly Glu Leu Glu Pro Ala Ser Phe Ser Trp Thr Tyr 3055 3060 3065 GAAGAA ATT AAA GAA GTT CAC AAG CGT TGG TGG CAA TTG AGA GAT AAT 9444 Glu GluIle Lys Glu Val His Lys Arg Trp Trp Gln Leu Arg Asp Asn 3070 3075 30803085 GCT GTA GAA ATC TTT CTA ACA AAT GGC AGA ACA CTC CTG TTG GCA TTT9492 Ala Val Glu Ile Phe Leu Thr Asn Gly Arg Thr Leu Leu Leu Ala Phe3090 3095 3100 GAT AAC ACC AAG GTT CGT GAT GAT GTA TAC CAC AAT ATA CTCACA AAT 9540 Asp Asn Thr Lys Val Arg Asp Asp Val Tyr His Asn Ile Leu ThrAsn 3105 3110 3115 AAC CTC CCT AAT CTT CTG GAA TAT GGT AAC ATC ACC GCTCTG ACA AAT 9588 Asn Leu Pro Asn Leu Leu Glu Tyr Gly Asn Ile Thr Ala LeuThr Asn 3120 3125 3130 TTA TGG TAT ACT GGG CAA ATT ACT AAT TTT GAA TATTTG ACT CAC TTA 9636 Leu Trp Tyr Thr Gly Gln Ile Thr Asn Phe Glu Tyr LeuThr His Leu 3135 3140 3145 AAC AAA CAT GCT GGC CGA TCC TTC AAT GAT CTCATG CAG TAT CCT GTG 9684 Asn Lys His Ala Gly Arg Ser Phe Asn Asp Leu MetGln Tyr Pro Val 3150 3155 3160 3165 TTC CCA TTT ATA CTT GCT GAC TAC GTTAGT GAG ACA CTT GAC CTC AAT 9732 Phe Pro Phe Ile Leu Ala Asp Tyr Val SerGlu Thr Leu Asp Leu Asn 3170 3175 3180 GAT CTG TTG ATA TAC AGA AAT CTCTCT AAA CCT ATA GCT GTT CAG TAT 9780 Asp Leu Leu Ile Tyr Arg Asn Leu SerLys Pro Ile Ala Val Gln Tyr 3185 3190 3195 AAA GAA AAA GAA GAT CGT TATGTG GAC ACA TAC AAG TAC TTG GAG GAA 9828 Lys Glu Lys Glu Asp Arg Tyr ValAsp Thr Tyr Lys Tyr Leu Glu Glu 3200 3205 3210 GAG TAC CGC AAA GGA GCCAGA GAA GAT GAC CCC ATG CCT CCC GTG CAG 9876 Glu Tyr Arg Lys Gly Ala ArgGlu Asp Asp Pro Met Pro Pro Val Gln 3215 3220 3225 CCC TAT CAC TAT GGCTCC CAC TAT TCC AAT AGC GGC ACT GTG CTT CAC 9924 Pro Tyr His Tyr Gly SerHis Tyr Ser Asn Ser Gly Thr Val Leu His 3230 3235 3240 3245 TTC CTG GTCAGG ATG CCT CCT TTC ACT AAA ATG TTT TTA GCC TAT CAA 9972 Phe Leu Val ArgMet Pro Pro Phe Thr Lys Met Phe Leu Ala Tyr Gln 3250 3255 3260 GAT CAAAGT TTT GAC ATT CCA GAC AGA ACT TTT CAT TCT ACA AAT ACA 10020 Asp GlnSer Phe Asp Ile Pro Asp Arg Thr Phe His Ser Thr Asn Thr 3265 3270 3275ACT TGG CGA CTC TCA TCT TTT GAA TCT ATG ACT GAT GTG AAA GAA CTT 10068Thr Trp Arg Leu Ser Ser Phe Glu Ser Met Thr Asp Val Lys Glu Leu 32803285 3290 ATC CCA GAG TTT TTC TAT CTT CCA GAG TTC CTA GTT AAC CGT GAAGGT 10116 Ile Pro Glu Phe Phe Tyr Leu Pro Glu Phe Leu Val Asn Arg GluGly 3295 3300 3305 TTT GAT TTT GGT GTG CGT CAG AAT GGT GAA CGG GTT AATCAC GTC AAC 10164 Phe Asp Phe Gly Val Arg Gln Asn Gly Glu Arg Val AsnHis Val Asn 3310 3315 3320 3325 CTT CCC CCT TGG GCG CGT AAT GAT CCT CGTCTT TTT ATC CTC ATC CAT 10212 Leu Pro Pro Trp Ala Arg Asn Asp Pro ArgLeu Phe Ile Leu Ile His 3330 3335 3340 CGG CAG GCT CTA GAG TCT GAC TACGTG TCG CAG AAC ATC TGT CAG TGG 10260 Arg Gln Ala Leu Glu Ser Asp TyrVal Ser Gln Asn Ile Cys Gln Trp 3345 3350 3355 ATT GAC TTG GTG TTT GGGTAT AAG CAA AAG GGG AAG GCT TCT GTT CAA 10308 Ile Asp Leu Val Phe GlyTyr Lys Gln Lys Gly Lys Ala Ser Val Gln 3360 3365 3370 GCG ATC AAT GTTTTT CAT CCT GCT ACA TAT TTT GGA ATG GAT GTC TCT 10356 Ala Ile Asn ValPhe His Pro Ala Thr Tyr Phe Gly Met Asp Val Ser 3375 3380 3385 GCA GTTGAA GAT CCA GTT CAG AGA CGA GCG CTA GAA ACC ATG ATA AAA 10404 Ala ValGlu Asp Pro Val Gln Arg Arg Ala Leu Glu Thr Met Ile Lys 3390 3395 34003405 S ACC TAC GGG CAG ACT CCC CGT CAG CTG TTC CAC ATG GCC CAT GTG AGC10452 Thr Tyr Gly Gln Thr Pro Arg Gln Leu Phe His Met Ala His Val Ser3410 3415 3420 AGA CCT GGA GCC AAG CTC AAT ATT GAA GGA GAG CTT CCA GCTGCT GTG 10500 Arg Pro Gly Ala Lys Leu Asn Ile Glu Gly Glu Leu Pro AlaAla Val 3425 3430 3435 GGG TTG CTA GTG CAG TTT GCT TTC AGG GAG ACC CGAGAA CAG GTC AAA 10548 Gly Leu Leu Val Gln Phe Ala Phe Arg Glu Thr ArgGlu Gln Val Lys 3440 3445 3450 GAA ATC ACC TAT CCG AGT CCT TTG TCA TGGATA AAA GGC TTG AAA TGG 10596 Glu Ile Thr Tyr Pro Ser Pro Leu Ser TrpIle Lys Gly Leu Lys Trp 3455 3460 3465 GGG GAA TAC GTG GGT TCC CCC AGTGCT CCA GTA CCT GTG GTC TGC TTC 10644 Gly Glu Tyr Val Gly Ser Pro SerAla Pro Val Pro Val Val Cys Phe 3470 3475 3480 3485 AGC CAG CCC CAC GGAGAA AGA TTT GGC TCT CTC CAG GCT CTG CCC ACC 10692 Ser Gln Pro His GlyGlu Arg Phe Gly Ser Leu Gln Ala Leu Pro Thr 3490 3495 3500 AGA GCA ATCTGT GGT TTG TCA CGG AAT TTC TGT CTT GTG ATG ACA TAT 10740 Arg Ala IleCys Gly Leu Ser Arg Asn Phe Cys Leu Val Met Thr Tyr 3505 3510 3515 AGCAAG GAA CAA GGT GTG AGA AGC ATG AAC AGT ACG GAC ATT CAG TGG 10788 SerLys Glu Gln Gly Val Arg Ser Met Asn Ser Thr Asp Ile Gln Trp 3520 35253530 TCA GCC ATC CTG AGC TGG GGA TAT GCT GAT AAT ATT TTA AGG TTG AAG10836 Ser Ala Ile Leu Ser Trp Gly Tyr Ala Asp Asn Ile Leu Arg Leu Lys3535 3540 3545 AGT AAA CAA AGT GAG CCT CCA GTA AAC TTT ATT CAA AGT TCACAA CAG 10884 Ser Lys Gln Ser Glu Pro Pro Val Asn Phe Ile Gln Ser SerGln Gln 3550 3555 3560 3565 TAC CAG GTG ACT AGT TGT GCT TGG GTG CCT GACAGT TGC CAG CTG TTT 10932 Tyr Gln Val Thr Ser Cys Ala Trp Val Pro AspSer Cys Gln Leu Phe 3570 3575 3580 ACT GGA AGC AAA TGC GGT GTC ATC ACAGCC TAC ACA AAC AGA TTT ACA 10980 Thr Gly Ser Lys Cys Gly Val Ile ThrAla Tyr Thr Asn Arg Phe Thr 3585 3590 3595 AGC AGC ACG CCA TCA GAA ATAGAA ATG GAG ACT CAA ATA CAT CTC TAT 11028 Ser Ser Thr Pro Ser Glu IleGlu Met Glu Thr Gln Ile His Leu Tyr 3600 3605 3610 GGT CAC ACA GAA GAGATA ACC AGC TTA TTT GTT TGC AAA CCA TAC AGT 11076 Gly His Thr Glu GluIle Thr Ser Leu Phe Val Cys Lys Pro Tyr Ser 3615 3620 3625 ATA CTG ATAAGT GTG AGC AGA GAC GGA ACC TGC ATC ATA TGG GAT TTA 11124 Ile Leu IleSer Val Ser Arg Asp Gly Thr Cys Ile Ile Trp Asp Leu 3630 3635 3640 3645AAC AGG TTA TGC TAT GTA CAA AGT CTG GCG GGA CAC AAA AGC CCT GTC 11172Asn Arg Leu Cys Tyr Val Gln Ser Leu Ala Gly His Lys Ser Pro Val 36503655 3660 ACA GCT GTC TCT GCC AGT GAA ACC TCA GGT GAT ATT GCT ACT GTGTGT 11220 Thr Ala Val Ser Ala Ser Glu Thr Ser Gly Asp Ile Ala Thr ValCys 3665 3670 3675 GAT TCA GCT GGC GGA GGC AGT GAC CTC AGA CTC TGG ACGGTG AAC GGG 11268 Asp Ser Ala Gly Gly Gly Ser Asp Leu Arg Leu Trp ThrVal Asn Gly 3680 3685 3690 GAT CTC GTT GGA CAT GTC CAC TGC AGG GAG ATCATC TGT TCC GTG GCT 11316 Asp Leu Val Gly His Val His Cys Arg Glu IleIle Cys Ser Val Ala 3695 3700 3705 TTC TCC AAC CAG CCT GAG GGA GTA TCTATC AAT GTA ATC GCT GGG GGA 11364 Phe Ser Asn Gln Pro Glu Gly Val SerIle Asn Val Ile Ala Gly Gly 3710 3715 3720 3725 TTA GAA AAT GGA ATT GTTAGG TTA TGG AGC ACA TGG GAC TTA AAG CCT 11412 Leu Glu Asn Gly Ile ValArg Leu Trp Ser Thr Trp Asp Leu Lys Pro 3730 3735 3740 GTG AGA GAA ATTACA TTT CCC AAA TCA AAT AAG CCC ATC ATC AGC CTT 11460 Val Arg Glu IleThr Phe Pro Lys Ser Asn Lys Pro Ile Ile Ser Leu 3745 3750 3755 ACA TTTTCT TGT GAT GGC CAC CAT TTG TAC ACA GCA AAC AGT GAT GGG 11508 Thr PheSer Cys Asp Gly His His Leu Tyr Thr Ala Asn Ser Asp Gly 3760 3765 3770ACC GTG ATT GCC TGG TGT CGG AAG GAC CAG CAC CGC TTG AAA CAG CCA 11556Thr Val Ile Ala Trp Cys Arg Lys Asp Gln His Arg Leu Lys Gln Pro 37753780 3785 ATG TTC TAT TCC TTC CTT AGC AGC TAT GCA GCC GGG TGAATGCGAA11602 Met Phe Tyr Ser Phe Leu Ser Ser Tyr Ala Ala Gly 3790 3795 3800TGAACTTCAT GTTCTCCAAA GCACTTTAAC TCCAAACTAG ATTTGTTGAC TTCACCAGTT 11662TTAGGAGGTT GAACCTAAAG AAATGGATGA CTGGACAAAC CATCCAAATA ATGATAAAGT 11722CTATTCATCT GCACAAAATT CTGAAGAGTC ACATGATCCT AAGAGGAAAG TTCTGTTCTA 11782TTTTAGTGAT AATCTGGAAG ATTGTGTCAA TATGCACTAG CCAACAAGTT TTAAGCCTGG 11842CATGGTACAT TAAAATGATA TTCTTAAAAT TTTTTCCCAC CAAGGTATTC CAAAGAAAAT 11902ATTAAGGTCT CCCCTTTTTC TATGATTCCA AAAGGACCAG TAGAATTTAA ATTGGTTGGT 11962TGATNGTTTA TATAAAACAC ACTAAAATTA TATTTTAAAA GTTTANTGCCN TGAAATACT 12022CCTCCCACCA CACACACATG CTCCAAAAGA GGAAAGAAAA AAAGATAATTT TTAGGACTT 12082GATAATTGCT TTCTTTGAGA AGCAAATTAT TCAGTAGGTG CCTCTGTACCA AATATTTTA 12142TGGAATATCT AAATACTAAA ATAAACTATG AATGAATCTC AAAATTAGGCA GTTTTTGCC 12202TGCTTTCTTA GCTCAAAGGA GAACCAGAAT TTTTTTGACA GCCACAAACA AGAATACAGG 12262AGTTATCTTG GATTTCAGAC ACATTCTGTT TCTTCATAAA AATTTTACTT AAAATCTGTA 12322ACGCTAGATA TTGACTATCC TTAGTTGAGT CACTGAGGTT TAAACACAAT GGTAAGTCTT 12382AAAGTCTGCT ATTTACAGAG CATTGAATCT GTACCAATTT GCAATAGAAA GCCTTCAGTA 12442TGCAAGAAGT TTGCATGGGT ATTAAGAACA CAGCCTAAAT AAGGCATTTG ATCTAATCTG 12502CAGGAAGAAT TTTCTTCCCC AAAACAGAAT TATAAAAGCT TACTTTAAAC AGGAGGCAGA 12562ATAATTCTTT TAGGAAACCA TTTCATTCTG TTTCTACTAA CCTATACCAT CTGA 12616 3801amino acids amino acid unknown protein 10 Met Ser Thr Asp Ser Asn SerLeu Ala Arg Glu Phe Leu Thr Asp Val 1 5 10 15 Asn Arg Leu Cys Asn AlaVal Val Gln Arg Val Glu Ala Arg Glu Glu 20 25 30 Glu Glu Glu Glu Thr HisMet Ala Thr Leu Gly Gln Tyr Leu Val His 35 40 45 Gly Arg Gly Phe Leu LeuLeu Thr Lys Leu Asn Ser Ile Ile Asp Gln 50 55 60 Ala Leu Thr Cys Arg GluGlu Leu Leu Thr Leu Leu Leu Ser Leu Leu 65 70 75 80 Pro Leu Val Trp LysIle Pro Val Gln Glu Glu Lys Ala Thr Asp Phe 85 90 95 Asn Leu Pro Leu SerAla Asp Ile Ile Leu Thr Lys Glu Lys Asn Ser 100 105 110 Ser Ser Gln ArgSer Thr Gln Glu Lys Leu His Leu Glu Gly Ser Ala 115 120 125 Leu Ser SerGln Val Ser Ala Lys Val Asn Val Phe Arg Lys Ser Arg 130 135 140 Arg GlnArg Lys Ile Thr His Arg Tyr Ser Val Arg Asp Ala Arg Lys 145 150 155 160Thr Gln Leu Ser Thr Ser Asp Ser Glu Ala Asn Ser Asp Glu Lys Gly 165 170175 Ile Ala Met Asn Lys His Arg Arg Pro His Leu Leu His His Phe Leu 180185 190 Thr Ser Phe Pro Lys Gln Asp His Pro Lys Ala Lys Leu Asp Arg Leu195 200 205 Ala Thr Lys Glu Gln Thr Pro Pro Asp Ala Met Ala Leu Glu AsnSer 210 215 220 Arg Glu Ile Ile Pro Arg Gln Gly Ser Asn Thr Asp Ile LeuSer Glu 225 230 235 240 Pro Ala Ala Leu Ser Val Ile Ser Asn Met Asn AsnSer Pro Phe Asp 245 250 255 Leu Cys His Val Leu Leu Ser Leu Leu Glu LysVal Cys Lys Phe Asp 260 265 270 Val Thr Leu Asn His Asn Ser Pro Leu AlaAla Ser Val Val Pro Thr 275 280 285 Leu Thr Glu Phe Leu Ala Gly Phe GlyAsp Cys Cys Ser Leu Ser Asp 290 295 300 Asn Leu Glu Ser Arg Val Val SerAla Gly Trp Thr Glu Glu Pro Val 305 310 315 320 Ala Leu Ile Gln Arg MetLeu Phe Arg Thr Val Leu His Leu Leu Ser 325 330 335 Val Asp Val Ser ThrAla Glu Met Met Pro Glu Asn Leu Arg Lys Asn 340 345 350 Leu Thr Glu LeuLeu Arg Ala Ala Leu Lys Ile Arg Ile Cys Leu Glu 355 360 365 Lys Gln ProAsp Pro Phe Ala Pro Arg Gln Lys Lys Thr Leu Gln Glu 370 375 380 Val GlnGlu Asp Phe Val Phe Ser Lys Tyr Arg His Arg Ala Leu Leu 385 390 395 400Leu Pro Glu Leu Leu Glu Gly Val Leu Gln Ile Leu Ile Cys Cys Leu 405 410415 Gln Ser Ala Ala Ser Asn Pro Phe Tyr Phe Ser Gln Ala Met Asp Leu 420425 430 Val Gln Glu Phe Ile Gln His His Gly Phe Asn Leu Phe Glu Thr Ala435 440 445 Val Leu Gln Met Glu Trp Leu Val Leu Arg Asp Gly Val Pro ProGlu 450 455 460 Ala Ser Glu His Leu Lys Ala Leu Ile Asn Ser Val Met LysIle Met 465 470 475 480 Ser Thr Val Lys Lys Val Lys Ser Glu Gln Leu HisHis Ser Met Cys 485 490 495 Thr Arg Lys Arg His Arg Arg Cys Glu Tyr SerHis Phe Met His His 500 505 510 His Arg Asp Leu Ser Gly Leu Leu Val SerAla Phe Lys Asn Gln Val 515 520 525 Ser Lys Asn Pro Phe Glu Glu Thr AlaAsp Gly Asp Val Tyr Tyr Pro 530 535 540 Glu Arg Cys Cys Cys Ile Ala ValCys Ala His Gln Cys Leu Arg Leu 545 550 555 560 Leu Gln Gln Ala Ser LeuSer Ser Thr Cys Val Gln Ile Leu Ser Gly 565 570 575 Val His Asn Ile GlyIle Cys Cys Cys Met Asp Pro Lys Ser Val Ile 580 585 590 Ile Pro Leu LeuHis Ala Phe Lys Leu Pro Ala Leu Lys Asn Phe Gln 595 600 605 Gln His IleLeu Asn Ile Leu Asn Lys Leu Ile Leu Asp Gln Leu Gly 610 615 620 Gly AlaGlu Ile Ser Pro Lys Ile Lys Lys Ala Ala Cys Asn Ile Cys 625 630 635 640Thr Val Asp Ser Asp Gln Leu Ala Gln Leu Glu Glu Thr Leu Gln Gly 645 650655 Asn Leu Cys Asp Ala Glu Leu Ser Ser Ser Leu Ser Ser Pro Ser Tyr 660665 670 Arg Phe Gln Gly Ile Leu Pro Ser Ser Gly Ser Glu Asp Leu Leu Trp675 680 685 Lys Trp Asp Ala Leu Lys Ala Tyr Gln Asn Phe Val Phe Glu GluAsp 690 695 700 Arg Leu His Ser Ile Gln Ile Ala Asn His Ile Cys Asn LeuIle Gln 705 710 715 720 Lys Gly Asn Ile Val Val Gln Trp Lys Leu Tyr AsnTyr Ile Phe Asn 725 730 735 Pro Val Leu Gln Arg Gly Val Glu Leu Ala HisHis Cys Gln His Leu 740 745 750 Ser Val Thr Ser Ala Gln Ser His Val CysSer His His Asn Gln Cys 755 760 765 Leu Pro Gln Asp Val Leu Gln Ile TyrVal Lys Thr Leu Pro Ile Leu 770 775 780 Leu Lys Ser Arg Val Ile Arg AspLeu Phe Leu Ser Cys Asn Gly Val 785 790 795 800 Ser Gln Ile Ile Glu LeuAsn Cys Leu Asn Gly Ile Arg Ser His Ser 805 810 815 Leu Lys Ala Phe GluThr Leu Ile Ile Ser Leu Gly Glu Gln Gln Lys 820 825 830 Asp Ala Ser ValPro Asp Ile Asp Gly Ile Asp Ile Glu Gln Lys Glu 835 840 845 Leu Ser SerVal His Val Gly Thr Ser Phe His His Gln Gln Ala Tyr 850 855 860 Ser AspSer Pro Gln Ser Leu Ser Lys Phe Tyr Ala Gly Leu Lys Glu 865 870 875 880Ala Tyr Pro Lys Arg Arg Lys Thr Val Asn Gln Asp Val His Ile Asn 885 890895 Thr Ile Asn Leu Phe Leu Cys Val Ala Phe Leu Cys Val Ser Lys Glu 900905 910 Ala Glu Ser Asp Arg Glu Ser Ala Asn Asp Ser Glu Asp Thr Ser Gly915 920 925 Tyr Asp Ser Thr Ala Ser Glu Pro Leu Ser His Met Leu Pro CysIle 930 935 940 Ser Leu Glu Ser Leu Val Leu Pro Ser Pro Glu His Met HisGln Ala 945 950 955 960 Ala Asp Ile Trp Ser Met Cys Arg Trp Ile Tyr MetLeu Ser Ser Val 965 970 975 Phe Gln Lys Gln Phe Tyr Arg Leu Gly Gly PheArg Val Cys His Lys 980 985 990 Leu Ile Phe Met Ile Ile Gln Lys Leu PheArg Ser His Lys Glu Glu 995 1000 1005 Gln Gly Lys Lys Glu Gly Asp ThrSer Val Asn Glu Asn Gln Asp Leu 1010 1015 1020 Asn Arg Ile Ser Gln ProLys Arg Thr Met Lys Glu Asp Leu Leu Ser 1025 1030 1035 1040 Leu Ala IleLys Ser Asp Pro Ile Pro Ser Glu Leu Gly Ser Leu Lys 1045 1050 1055 LysSer Ala Asp Ser Leu Gly Lys Leu Glu Leu Gln His Ile Ser Ser 1060 10651070 Ile Asn Val Glu Glu Val Ser Ala Thr Glu Ala Ala Pro Glu Glu Ala1075 1080 1085 Lys Leu Phe Thr Ser Gln Glu Ser Glu Thr Ser Leu Gln SerIle Arg 1090 1095 1100 Leu Leu Glu Ala Leu Leu Ala Ile Cys Leu His GlyAla Arg Thr Ser 1105 1110 1115 1120 Gln Gln Lys Met Glu Leu Glu Leu ProAsn Gln Asn Leu Ser Val Glu 1125 1130 1135 Ser Ile Leu Phe Glu Met ArgAsp His Leu Ser Gln Ser Lys Val Ile 1140 1145 1150 Glu Thr Gln Leu AlaLys Pro Leu Phe Asp Ala Leu Leu Arg Val Ala 1155 1160 1165 Leu Gly AsnTyr Ser Ala Asp Phe Glu His Asn Asp Ala Met Thr Glu 1170 1175 1180 LysSer His Gln Ser Ala Glu Glu Leu Ser Ser Gln Pro Gly Asp Phe 1185 11901195 1200 Ser Glu Glu Ala Glu Asp Ser Gln Cys Cys Ser Phe Lys Leu LeuVal 1205 1210 1215 Glu Glu Glu Gly Tyr Glu Ala Asp Ser Glu Ser Asn ProGlu Asp Gly 1220 1225 1230 Glu Thr Gln Asp Asp Gly Val Asp Leu Lys SerGlu Thr Glu Gly Phe 1235 1240 1245 Ser Ala Ser Ser Ser Pro Asn Asp LeuLeu Glu Asn Leu Thr Gln Gly 1250 1255 1260 Glu Ile Ile Tyr Pro Glu IleCys Met Leu Glu Leu Asn Leu Leu Ser 1265 1270 1275 1280 Ala Ser Lys AlaLys Leu Asp Val Leu Ala His Val Phe Glu Ser Phe 1285 1290 1295 Leu LysIle Ile Arg Gln Lys Glu Lys Asn Val Phe Leu Leu Met Gln 1300 1305 1310Gln Gly Thr Val Lys Asn Leu Leu Gly Gly Phe Leu Ser Ile Leu Thr 13151320 1325 Gln Asp Asp Ser Asp Phe Gln Ala Cys Gln Arg Val Leu Val AspLeu 1330 1335 1340 Leu Val Ser Leu Met Ser Ser Arg Thr Cys Ser Glu GluLeu Thr Leu 1345 1350 1355 1360 Leu Leu Arg Ile Phe Leu Glu Lys Ser ProCys Thr Lys Ile Leu Leu 1365 1370 1375 Leu Gly Ile Leu Lys Ile Ile GluSer Asp Thr Thr Met Ser Pro Ser 1380 1385 1390 Gln Tyr Leu Thr Phe ProLeu Leu His Ala Pro Asn Leu Ser Asn Gly 1395 1400 1405 Val Ser Ser GlnLys Tyr Pro Gly Ile Leu Asn Ser Lys Ala Met Gly 1410 1415 1420 Leu LeuArg Arg Ala Arg Val Ser Arg Ser Lys Lys Glu Ala Asp Arg 1425 1430 14351440 Glu Ser Phe Pro His Arg Leu Leu Ser Ser Trp His Ile Ala Pro Val1445 1450 1455 His Leu Pro Leu Leu Gly Gln Asn Cys Trp Pro His Leu SerGlu Gly 1460 1465 1470 Phe Ser Val Ser Leu Trp Phe Asn Val Glu Cys IleHis Glu Ala Glu 1475 1480 1485 Ser Thr Thr Glu Lys Gly Lys Lys Ile LysLys Arg Asn Lys Ser Leu 1490 1495 1500 Ile Leu Pro Asp Ser Ser Phe AspGly Thr Glu Ser Asp Arg Pro Glu 1505 1510 1515 1520 Gly Ala Glu Tyr IleAsn Pro Gly Glu Arg Leu Ile Glu Glu Gly Cys 1525 1530 1535 Ile His IleIle Ser Leu Gly Ser Lys Ala Leu Met Ile Gln Val Trp 1540 1545 1550 AlaAsp Pro His Asn Ala Thr Leu Ile Phe Arg Val Cys Met Asp Ser 1555 15601565 Asn Asp Asp Met Lys Ala Val Leu Leu Ala Gln Val Glu Ser Gln Glu1570 1575 1580 Asn Ile Phe Leu Pro Ser Lys Trp Gln His Leu Val Leu ThrTyr Leu 1585 1590 1595 1600 Gln Gln Pro Gln Gly Lys Arg Arg Ile His GlyLys Ile Ser Ile Trp 1605 1610 1615 Val Ser Gly Gln Arg Lys Pro Asp ValThr Leu Asp Phe Met Leu Pro 1620 1625 1630 Arg Lys Thr Ser Leu Ser SerAsp Ser Asn Lys Thr Phe Cys Met Ile 1635 1640 1645 Gly His Cys Leu SerSer Gln Glu Glu Phe Leu Gln Leu Ala Gly Lys 1650 1655 1660 Trp Asp LeuGly Asn Leu Leu Leu Phe Asn Gly Ala Lys Val Gly Ser 1665 1670 1675 1680Gln Glu Ala Phe Tyr Leu Tyr Ala Cys Gly Pro Asn His Thr Ser Val 16851690 1695 Met Pro Cys Lys Tyr Gly Lys Pro Val Asn Asp Tyr Ser Lys TyrIle 1700 1705 1710 Asn Lys Glu Ile Leu Arg Cys Glu Gln Ile Arg Glu LeuPhe Met Thr 1715 1720 1725 Lys Lys Asp Val Asp Ile Gly Leu Leu Ile GluSer Leu Ser Val Val 1730 1735 1740 Tyr Thr Thr Tyr Cys Pro Ala Gln TyrThr Ile Tyr Glu Pro Val Ile 1745 1750 1755 1760 Arg Leu Lys Gly Gln MetLys Thr Gln Leu Ser Gln Arg Pro Phe Ser 1765 1770 1775 Ser Lys Glu ValGln Ser Ile Leu Leu Glu Pro His His Leu Lys Asn 1780 1785 1790 Leu GlnPro Thr Glu Tyr Lys Thr Ile Gln Gly Ile Leu His Glu Ile 1795 1800 1805Gly Gly Thr Gly Ile Phe Val Phe Leu Phe Ala Arg Val Val Glu Leu 18101815 1820 Ser Ser Cys Glu Glu Thr Gln Ala Leu Ala Leu Arg Val Ile LeuSer 1825 1830 1835 1840 Leu Ile Lys Tyr Asn Gln Gln Arg Val His Glu LeuGlu Asn Cys Asn 1845 1850 1855 Gly Leu Ser Met Ile His Gln Val Leu IleLys Gln Lys Cys Ile Val 1860 1865 1870 Gly Phe Tyr Ile Leu Lys Thr LeuLeu Glu Gly Cys Cys Gly Glu Asp 1875 1880 1885 Ile Ile Tyr Met Asn GluAsn Gly Glu Phe Lys Leu Asp Val Asp Ser 1890 1895 1900 Asn Ala Ile IleGln Asp Val Lys Leu Leu Glu Glu Leu Leu Leu Asp 1905 1910 1915 1920 TrpLys Ile Trp Ser Lys Ala Glu Gln Gly Val Trp Glu Thr Leu Leu 1925 19301935 Ala Ala Leu Glu Val Leu Ile Arg Ala Asp His His Gln Gln Met Phe1940 1945 1950 Asn Ile Lys Gln Leu Leu Lys Ala Gln Val Val His His PheLeu Leu 1955 1960 1965 Thr Cys Gln Val Leu Gln Glu Tyr Lys Glu Gly GlnLeu Thr Pro Met 1970 1975 1980 Pro Arg Glu Val Cys Arg Ser Phe Val LysIle Ile Ala Glu Val Leu 1985 1990 1995 2000 Gly Ser Pro Pro Asp Leu GluLeu Leu Thr Ile Ile Phe Asn Phe Leu 2005 2010 2015 Leu Ala Val His ProPro Thr Asn Thr Tyr Val Cys His Asn Pro Thr 2020 2025 2030 Asn Phe TyrPhe Ser Leu His Ile Asp Gly Lys Ile Phe Gln Glu Lys 2035 2040 2045 ValArg Ser Ile Met Tyr Leu Arg His Ser Ser Ser Gly Gly Arg Ser 2050 20552060 Leu Met Ser Pro Gly Phe Met Val Ile Ser Pro Ser Gly Phe Thr Ala2065 2070 2075 2080 Ser Pro Tyr Glu Gly Glu Asn Ser Ser Asn Ile Ile ProGln Gln Met 2085 2090 2095 Ala Ala His Met Leu Arg Ser Arg Ser Leu ProAla Phe Pro Thr Ser 2100 2105 2110 Ser Leu Leu Thr Gln Ser Gln Lys LeuThr Gly Ser Leu Gly Cys Ser 2115 2120 2125 Ile Asp Arg Leu Gln Asn IleAla Asp Thr Tyr Val Ala Thr Gln Ser 2130 2135 2140 Lys Lys Gln Asn SerLeu Gly Ser Ser Asp Thr Leu Lys Lys Gly Lys 2145 2150 2155 2160 Glu AspAla Phe Ile Ser Ser Cys Glu Ser Ala Lys Thr Val Cys Glu 2165 2170 2175Met Glu Ala Val Leu Ser Ala Gln Val Ser Val Ser Asp Val Pro Lys 21802185 2190 Gly Val Leu Gly Phe Pro Val Val Lys Ala Asp His Lys Gln LeuGly 2195 2200 2205 Ala Glu Pro Arg Ser Glu Asp Asp Ser Pro Gly Asp GluSer Cys Pro 2210 2215 2220 Arg Arg Pro Asp Tyr Leu Lys Gly Leu Ala SerPhe Gln Arg Ser His 2225 2230 2235 2240 Ser Thr Ile Ala Ser Leu Gly LeuAla Phe Pro Ser Gln Asn Gly Ser 2245 2250 2255 Ala Ala Val Gly Arg TrpPro Ser Leu Val Asp Arg Asn Thr Asp Asp 2260 2265 2270 Trp Glu Asn PheAla Tyr Ser Leu Gly Tyr Glu Pro Asn Tyr Asn Arg 2275 2280 2285 Thr AlaSer Ala His Ser Val Thr Glu Asp Cys Leu Val Pro Ile Cys 2290 2295 2300Cys Gly Leu Tyr Glu Leu Leu Ser Gly Val Leu Leu Ile Leu Pro Asp 23052310 2315 2320 Val Leu Leu Glu Asp Val Met Asp Lys Leu Ile Gln Ala AspThr Leu 2325 2330 2335 Leu Val Leu Val Asn His Pro Ser Pro Ala Ile GlnGln Gly Val Ile 2340 2345 2350 Lys Leu Leu Asp Ala Tyr Phe Ala Arg AlaSer Lys Glu Gln Lys Asp 2355 2360 2365 Lys Phe Leu Lys Asn Arg Gly PheSer Leu Leu Ala Asn Gln Leu Tyr 2370 2375 2380 Leu His Arg Gly Thr GlnGlu Leu Leu Glu Cys Phe Ile Glu Met Phe 2385 2390 2395 2400 Phe Gly ArgHis Ile Gly Leu Asp Glu Glu Phe Asp Leu Glu Asp Val 2405 2410 2415 ArgAsn Met Gly Leu Phe Gln Lys Trp Ser Val Ile Pro Ile Leu Gly 2420 24252430 Leu Ile Glu Thr Ser Leu Tyr Asp Asn Ile Leu Leu His Asn Ala Leu2435 2440 2445 Leu Leu Leu Leu Gln Ile Leu Asn Ser Cys Ser Lys Val AlaAsp Met 2450 2455 2460 Leu Leu Asp Asn Gly Leu Leu Tyr Val Leu Cys AsnThr Val Ala Ala 2465 2470 2475 2480 Leu Asn Gly Leu Glu Lys Asn Ile ProMet Ser Glu Tyr Lys Leu Leu 2485 2490 2495 Ala Cys Asp Ile Gln Gln LeuPhe Ile Ala Val Thr Ile His Ala Cys 2500 2505 2510 Ser Ser Ser Gly SerGln Tyr Phe Arg Val Ile Glu Asp Leu Ile Val 2515 2520 2525 Met Leu GlyTyr Leu Gln Asn Ser Lys Asn Lys Arg Thr Gln Asn Met 2530 2535 2540 AlaVal Ala Leu Gln Leu Arg Val Leu Gln Ala Ala Met Glu Phe Ile 2545 25502555 2560 Arg Thr Thr Ala Asn His Asp Ser Glu Asn Leu Thr Asp Ser LeuGln 2565 2570 2575 Ser Pro Ser Ala Pro His His Ala Val Val Gln Lys ArgLys Ser Ile 2580 2585 2590 Ala Gly Pro Arg Lys Phe Pro Leu Ala Gln ThrGlu Ser Leu Leu Met 2595 2600 2605 Lys Met Arg Ser Val Ala Asn Asp GluLeu His Val Met Met Gln Arg 2610 2615 2620 Arg Met Ser Gln Glu Asn ProSer Gln Ala Thr Glu Thr Glu Leu Ala 2625 2630 2635 2640 Gln Arg Leu GlnArg Leu Thr Val Leu Ala Val Asn Arg Ile Ile Tyr 2645 2650 2655 Gln GluPhe Asn Ser Asp Ile Ile Asp Ile Leu Arg Thr Pro Glu Asn 2660 2665 2670Val Thr Gln Ser Lys Thr Ser Val Phe Gln Thr Glu Ile Ser Glu Glu 26752680 2685 Asn Ile His His Glu Gln Ser Ser Val Phe Asn Pro Phe Gln LysGlu 2690 2695 2700 Ile Phe Thr Tyr Leu Val Glu Gly Phe Lys Val Ser IleGly Ser Ser 2705 2710 2715 2720 Lys Ala Ser Gly Ser Lys Gln Gln Trp ThrLys Ile Leu Trp Ser Cys 2725 2730 2735 Lys Glu Thr Phe Arg Met Gln LeuGly Arg Leu Leu Val His Ile Leu 2740 2745 2750 Ser Pro Ala His Ala AlaGln Glu Arg Lys Gln Ile Phe Glu Ile Val 2755 2760 2765 His Glu Pro AsnHis Gln Glu Ile Leu Arg Asp Cys Leu Ser Pro Ser 2770 2775 2780 Leu GlnHis Gly Ala Lys Leu Val Leu Tyr Leu Ser Glu Leu Ile His 2785 2790 27952800 Asn His Gln Gly Glu Leu Thr Glu Glu Glu Leu Gly Thr Ala Glu Leu2805 2810 2815 Leu Met Asn Ala Leu Lys Leu Cys Gly His Lys Cys Ile ProPro Ser 2820 2825 2830 Ala Ser Thr Lys Ala Asp Leu Ile Lys Met Ile LysGlu Glu Gln Lys 2835 2840 2845 Lys Tyr Glu Thr Glu Glu Gly Val Asn LysAla Ala Trp Gln Lys Thr 2850 2855 2860 Val Asn Asn Asn Gln Gln Ser LeuPhe Gln Arg Leu Asp Ser Lys Ser 2865 2870 2875 2880 Lys Asp Ile Ser LysIle Ala Ala Asp Ile Thr Gln Ala Val Ser Leu 2885 2890 2895 Ser Gln GlyAsn Glu Arg Lys Lys Val Ile Gln His Ile Arg Gly Met 2900 2905 2910 TyrLys Val Asp Leu Ser Ala Ser Arg His Trp Gln Glu Leu Ile Gln 2915 29202925 Gln Leu Thr His Asp Arg Ala Val Trp Tyr Asp Pro Ile Tyr Tyr Pro2930 2935 2940 Thr Ser Trp Gln Leu Asp Pro Thr Glu Gly Pro Asn Arg GluArg Arg 2945 2950 2955 2960 Arg Leu Gln Arg Cys Tyr Leu Thr Ile Pro AsnLys Tyr Leu Leu Arg 2965 2970 2975 Asp Arg Gln Lys Ser Glu Asp Val ValLys Pro Pro Leu Ser Tyr Leu 2980 2985 2990 Phe Glu Asp Lys Thr His SerSer Phe Ser Ser Thr Val Lys Asp Lys 2995 3000 3005 Ala Ala Ser Glu SerIle Arg Val Asn Arg Arg Cys Ile Ser Val Ala 3010 3015 3020 Pro Ser ArgGlu Thr Ala Gly Glu Leu Leu Leu Gly Lys Cys Gly Met 3025 3030 3035 3040Tyr Phe Val Glu Asp Asn Ala Ser Asp Thr Val Glu Ser Ser Ser Leu 30453050 3055 Gln Gly Glu Leu Glu Pro Ala Ser Phe Ser Trp Thr Tyr Glu GluIle 3060 3065 3070 Lys Glu Val His Lys Arg Trp Trp Gln Leu Arg Asp AsnAla Val Glu 3075 3080 3085 Ile Phe Leu Thr Asn Gly Arg Thr Leu Leu LeuAla Phe Asp Asn Thr 3090 3095 3100 Lys Val Arg Asp Asp Val Tyr His AsnIle Leu Thr Asn Asn Leu Pro 3105 3110 3115 3120 Asn Leu Leu Glu Tyr GlyAsn Ile Thr Ala Leu Thr Asn Leu Trp Tyr 3125 3130 3135 Thr Gly Gln IleThr Asn Phe Glu Tyr Leu Thr His Leu Asn Lys His 3140 3145 3150 Ala GlyArg Ser Phe Asn Asp Leu Met Gln Tyr Pro Val Phe Pro Phe 3155 3160 3165Ile Leu Ala Asp Tyr Val Ser Glu Thr Leu Asp Leu Asn Asp Leu Leu 31703175 3180 Ile Tyr Arg Asn Leu Ser Lys Pro Ile Ala Val Gln Tyr Lys GluLys 3185 3190 3195 3200 Glu Asp Arg Tyr Val Asp Thr Tyr Lys Tyr Leu GluGlu Glu Tyr Arg 3205 3210 3215 Lys Gly Ala Arg Glu Asp Asp Pro Met ProPro Val Gln Pro Tyr His 3220 3225 3230 Tyr Gly Ser His Tyr Ser Asn SerGly Thr Val Leu His Phe Leu Val 3235 3240 3245 Arg Met Pro Pro Phe ThrLys Met Phe Leu Ala Tyr Gln Asp Gln Ser 3250 3255 3260 Phe Asp Ile ProAsp Arg Thr Phe His Ser Thr Asn Thr Thr Trp Arg 3265 3270 3275 3280 LeuSer Ser Phe Glu Ser Met Thr Asp Val Lys Glu Leu Ile Pro Glu 3285 32903295 Phe Phe Tyr Leu Pro Glu Phe Leu Val Asn Arg Glu Gly Phe Asp Phe3300 3305 3310 Gly Val Arg Gln Asn Gly Glu Arg Val Asn His Val Asn LeuPro Pro 3315 3320 3325 Trp Ala Arg Asn Asp Pro Arg Leu Phe Ile Leu IleHis Arg Gln Ala 3330 3335 3340 Leu Glu Ser Asp Tyr Val Ser Gln Asn IleCys Gln Trp Ile Asp Leu 3345 3350 3355 3360 Val Phe Gly Tyr Lys Gln LysGly Lys Ala Ser Val Gln Ala Ile Asn 3365 3370 3375 Val Phe His Pro AlaThr Tyr Phe Gly Met Asp Val Ser Ala Val Glu 3380 3385 3390 Asp Pro ValGln Arg Arg Ala Leu Glu Thr Met Ile Lys Thr Tyr Gly 3395 3400 3405 GlnThr Pro Arg Gln Leu Phe His Met Ala His Val Ser Arg Pro Gly 3410 34153420 Ala Lys Leu Asn Ile Glu Gly Glu Leu Pro Ala Ala Val Gly Leu Leu3425 3430 3435 3440 Val Gln Phe Ala Phe Arg Glu Thr Arg Glu Gln Val LysGlu Ile Thr 3445 3450 3455 Tyr Pro Ser Pro Leu Ser Trp Ile Lys Gly LeuLys Trp Gly Glu Tyr 3460 3465 3470 Val Gly Ser Pro Ser Ala Pro Val ProVal Val Cys Phe Ser Gln Pro 3475 3480 3485 His Gly Glu Arg Phe Gly SerLeu Gln Ala Leu Pro Thr Arg Ala Ile 3490 3495 3500 Cys Gly Leu Ser ArgAsn Phe Cys Leu Val Met Thr Tyr Ser Lys Glu 3505 3510 3515 3520 Gln GlyVal Arg Ser Met Asn Ser Thr Asp Ile Gln Trp Ser Ala Ile 3525 3530 3535Leu Ser Trp Gly Tyr Ala Asp Asn Ile Leu Arg Leu Lys Ser Lys Gln 35403545 3550 Ser Glu Pro Pro Val Asn Phe Ile Gln Ser Ser Gln Gln Tyr GlnVal 3555 3560 3565 Thr Ser Cys Ala Trp Val Pro Asp Ser Cys Gln Leu PheThr Gly Ser 3570 3575 3580 Lys Cys Gly Val Ile Thr Ala Tyr Thr Asn ArgPhe Thr Ser Ser Thr 3585 3590 3595 3600 Pro Ser Glu Ile Glu Met Glu ThrGln Ile His Leu Tyr Gly His Thr 3605 3610 3615 Glu Glu Ile Thr Ser LeuPhe Val Cys Lys Pro Tyr Ser Ile Leu Ile 3620 3625 3630 Ser Val Ser ArgAsp Gly Thr Cys Ile Ile Trp Asp Leu Asn Arg Leu 3635 3640 3645 Cys TyrVal Gln Ser Leu Ala Gly His Lys Ser Pro Val Thr Ala Val 3650 3655 3660Ser Ala Ser Glu Thr Ser Gly Asp Ile Ala Thr Val Cys Asp Ser Ala 36653670 3675 3680 Gly Gly Gly Ser Asp Leu Arg Leu Trp Thr Val Asn Gly AspLeu Val 3685 3690 3695 Gly His Val His Cys Arg Glu Ile Ile Cys Ser ValAla Phe Ser Asn 3700 3705 3710 Gln Pro Glu Gly Val Ser Ile Asn Val IleAla Gly Gly Leu Glu Asn 3715 3720 3725 Gly Ile Val Arg Leu Trp Ser ThrTrp Asp Leu Lys Pro Val Arg Glu 3730 3735 3740 Ile Thr Phe Pro Lys SerAsn Lys Pro Ile Ile Ser Leu Thr Phe Ser 3745 3750 3755 3760 Cys Asp GlyHis His Leu Tyr Thr Ala Asn Ser Asp Gly Thr Val Ile 3765 3770 3775 AlaTrp Cys Arg Lys Asp Gln His Arg Leu Lys Gln Pro Met Phe Tyr 3780 37853790 Ser Phe Leu Ser Ser Tyr Ala Ala Gly 3795 3800 12225 base pairsnucleic acid single linear DNA CDS 190..11208 11 GCGGCCGCGT CGACGCGGCGGCGGCAGCGG CGTCGGCTCG GGGTTCTCCG GGAGAGGGGG 60 AGTGCGCGGC GGCCGCAGCTGCCACAAACC AGGTGAAGCT TTGTTCTAAG AATATTTGTT 120 TCATCTAGTT TATGAGTCCAAATGATATAG ACTGTAAATG TCACAGCAGT GGTGAAAGAC 180 TGCTCGGTC ATG AGC ACCGAC AGT AAC TCA CTG GCA CGT GAA TTT CTG 228 Met Ser Thr Asp Ser Asn SerLeu Ala Arg Glu Phe Leu 1 5 10 ACC GAT GTC AAC CGG CTT TGC AAT GCA GTGGTC CAG AGG GTG GAG GCC 276 Thr Asp Val Asn Arg Leu Cys Asn Ala Val ValGln Arg Val Glu Ala 15 20 25 AGG GAG GAA GAA GAG GAG GAG ACG CAC ATG GCAACC CTT GGA CAG TAC 324 Arg Glu Glu Glu Glu Glu Glu Thr His Met Ala ThrLeu Gly Gln Tyr 30 35 40 45 CTT GTC CAT GGT CGA GGA TTT CTA TTA CTT ACCAAG CTA AAT TCT ATA 372 Leu Val His Gly Arg Gly Phe Leu Leu Leu Thr LysLeu Asn Ser Ile 50 55 60 ATT GAT CAG GCA TTG ACA TGT AGA GAA GAA CTC CTGACT CTT CTT CTG 420 Ile Asp Gln Ala Leu Thr Cys Arg Glu Glu Leu Leu ThrLeu Leu Leu 65 70 75 TCT CTC CTT CCA CTG GTA TGG AAG ATA CCT GTC CAA GAAGAA AAG GCA 468 Ser Leu Leu Pro Leu Val Trp Lys Ile Pro Val Gln Glu GluLys Ala 80 85 90 ACA GAT TTT AAC CTA CCG CTC TCA GCA GAT ATA ATC CTG ACCAAA GAA 516 Thr Asp Phe Asn Leu Pro Leu Ser Ala Asp Ile Ile Leu Thr LysGlu 95 100 105 AAG AAC TCA AGT TCA CAA AGA TCC ACT CAG GAA AAA TTA CATTTA GAA 564 Lys Asn Ser Ser Ser Gln Arg Ser Thr Gln Glu Lys Leu His LeuGlu 110 115 120 125 GGA AGT GCC CTG TCT AGT CAG GTT TCT GCA AAA GTA AATGTT TTT CGA 612 Gly Ser Ala Leu Ser Ser Gln Val Ser Ala Lys Val Asn ValPhe Arg 130 135 140 AAA AGC AGA CGA CAG CGT AAA ATT ACC CAT CGC TAT TCTGTA AGA GAT 660 Lys Ser Arg Arg Gln Arg Lys Ile Thr His Arg Tyr Ser ValArg Asp 145 150 155 GCA AGA AAG ACA CAG CTC TCC ACC TCA GAT TCA GAA GCCAAT TCA GAT 708 Ala Arg Lys Thr Gln Leu Ser Thr Ser Asp Ser Glu Ala AsnSer Asp 160 165 170 GAA AAA GGC ATA GCA ATG AAT AAG CAT AGA AGG CCC CATCTG CTG CAT 756 Glu Lys Gly Ile Ala Met Asn Lys His Arg Arg Pro His LeuLeu His 175 180 185 CAT TTT TTA ACA TCG TTT CCT AAA CAA GAC CAC CCC AAAGCT AAA CTT 804 His Phe Leu Thr Ser Phe Pro Lys Gln Asp His Pro Lys AlaLys Leu 190 195 200 205 GAC CGC TTA GCA ACC AAA GAA CAG ACT CCT CCA GATGCT ATG GCT TTG 852 Asp Arg Leu Ala Thr Lys Glu Gln Thr Pro Pro Asp AlaMet Ala Leu 210 215 220 GAA AAT TCC AGA GAG ATT ATT CCA AGA CAG GGG TCAAAC ACT GAC ATT 900 Glu Asn Ser Arg Glu Ile Ile Pro Arg Gln Gly Ser AsnThr Asp Ile 225 230 235 TTA AGT GAG CCA GCT GCC TTG TCT GTT ATC AGT AACATG AAC AAT TCT 948 Leu Ser Glu Pro Ala Ala Leu Ser Val Ile Ser Asn MetAsn Asn Ser 240 245 250 CCA TTT GAC TTA TGT CAT GTT TTG TTA TCT TTA TTAGAA AAA GTT TGT 996 Pro Phe Asp Leu Cys His Val Leu Leu Ser Leu Leu GluLys Val Cys 255 260 265 AAG TTT GAC GTT ACC TTG AAT CAT AAT TCT CCT TTAGCA GCC AGT GTA 1044 Lys Phe Asp Val Thr Leu Asn His Asn Ser Pro Leu AlaAla Ser Val 270 275 280 285 GTG CCC ACA CTA ACT GAA TTC CTA GCA GGC TTTGGG GAC TGC TGC AGT 1092 Val Pro Thr Leu Thr Glu Phe Leu Ala Gly Phe GlyAsp Cys Cys Ser 290 295 300 CTG AGC GAC AAC TTG GAG AGT CGA GTA GTT TCTGCA GGT TGG ACC GAA 1140 Leu Ser Asp Asn Leu Glu Ser Arg Val Val Ser AlaGly Trp Thr Glu 305 310 315 GAA CCG GTG GCT TTG ATT CAA AGG ATG CTC TTTCGA ACA GTG TTG CAT 1188 Glu Pro Val Ala Leu Ile Gln Arg Met Leu Phe ArgThr Val Leu His 320 325 330 CTT CTG TCA GTA GAT GTT AGT ACT GCA GAG ATGATG CCA GAA AAT CTT 1236 Leu Leu Ser Val Asp Val Ser Thr Ala Glu Met MetPro Glu Asn Leu 335 340 345 AGG AAA AAT TTA ACT GAA TTG CTT AGA GCA GCTTTA AAA ATT AGA ATA 1284 Arg Lys Asn Leu Thr Glu Leu Leu Arg Ala Ala LeuLys Ile Arg Ile 350 355 360 365 TGC CTA GAA AAG CAG CCT GAC CCT TTT GCACCA AGA CAA AAG AAA ACA 1332 Cys Leu Glu Lys Gln Pro Asp Pro Phe Ala ProArg Gln Lys Lys Thr 370 375 380 CTG CAG GAG GTT CAG GAA GAT TTT GTG TTTTCA AAG TAT CGT CAT AGA 1380 Leu Gln Glu Val Gln Glu Asp Phe Val Phe SerLys Tyr Arg His Arg 385 390 395 GCC CTT CTT TTA CCT GAG CTT TTG GAA GGAGTT CTT CAG ATT CTG ATC 1428 Ala Leu Leu Leu Pro Glu Leu Leu Glu Gly ValLeu Gln Ile Leu Ile 400 405 410 TGT TGT CTT CAA AGT GCA GCT TCA AAT CCCTTC TAC TTC AGT CAA GCC 1476 Cys Cys Leu Gln Ser Ala Ala Ser Asn Pro PheTyr Phe Ser Gln Ala 415 420 425 ATG GAT TTG GTT CAA GAA TTC ATT CAG CATCAT GGA TTT AAT TTA TTT 1524 Met Asp Leu Val Gln Glu Phe Ile Gln His HisGly Phe Asn Leu Phe 430 435 440 445 GAA ACA GCA GTT CTT CAA ATG GAA TGGCTG GTT TTA AGA GAT GGA GTT 1572 Glu Thr Ala Val Leu Gln Met Glu Trp LeuVal Leu Arg Asp Gly Val 450 455 460 CCT CCC GAG GCC TCA GAG CAT TTG AAAGCC CTA ATA AAT AGT GTG ATG 1620 Pro Pro Glu Ala Ser Glu His Leu Lys AlaLeu Ile Asn Ser Val Met 465 470 475 AAA ATA ATG AGC ACT GTC AAA AAA GTGAAA TCA GAG CAA CTT CAT CAT 1668 Lys Ile Met Ser Thr Val Lys Lys Val LysSer Glu Gln Leu His His 480 485 490 TCG ATG TGT ACA AGA AAA AGG CAC AGACGA TGT GAA TAT TCT CAT TTT 1716 Ser Met Cys Thr Arg Lys Arg His Arg ArgCys Glu Tyr Ser His Phe 495 500 505 ATG CAT CAT CAC CGA GAT CTC TCA GGTCTT CTG GTT TCG GCT TTT AAA 1764 Met His His His Arg Asp Leu Ser Gly LeuLeu Val Ser Ala Phe Lys 510 515 520 525 AAC CAG GTT TCC AAA AAC CCA TTTGAA GAG ACT GCA GAT GGA GAT GTT 1812 Asn Gln Val Ser Lys Asn Pro Phe GluGlu Thr Ala Asp Gly Asp Val 530 535 540 TAT TAT CCT GAG CGG TGC TGT TGCATT GCA GTG TGT GCC CAT CAG TGC 1860 Tyr Tyr Pro Glu Arg Cys Cys Cys IleAla Val Cys Ala His Gln Cys 545 550 555 TTG CGC TTA CTA CAG CAG GCT TCCTTG AGC AGC ACT TGT GTC CAG ATC 1908 Leu Arg Leu Leu Gln Gln Ala Ser LeuSer Ser Thr Cys Val Gln Ile 560 565 570 CTA TCG GGT GTT CAT AAC ATT GGAATA TGC TGT TGT ATG GAT CCC AAA 1956 Leu Ser Gly Val His Asn Ile Gly IleCys Cys Cys Met Asp Pro Lys 575 580 585 TCT GTA ATC ATT CCT TTG CTC CATGCT TTT AAA TTG CCA GCA CTG AAA 2004 Ser Val Ile Ile Pro Leu Leu His AlaPhe Lys Leu Pro Ala Leu Lys 590 595 600 605 AAT TTT CAG CAG CAT ATA TTGAAT ATC CTT AAC AAA CTT ATT TTG GAT 2052 Asn Phe Gln Gln His Ile Leu AsnIle Leu Asn Lys Leu Ile Leu Asp 610 615 620 CAG TTA GGA GGA GCA GAG ATATCA CCA AAA ATT AAA AAA GCA GCT TGT 2100 Gln Leu Gly Gly Ala Glu Ile SerPro Lys Ile Lys Lys Ala Ala Cys 625 630 635 AAT ATT TGT ACT GTT GAC TCTGAC CAA CTA GCC CAA TTA GAA GAG ACA 2148 Asn Ile Cys Thr Val Asp Ser AspGln Leu Ala Gln Leu Glu Glu Thr 640 645 650 CTG CAG GGA AAC TTA TGT GATGCT GAA CTC TCC TCA AGT TTA TCC AGT 2196 Leu Gln Gly Asn Leu Cys Asp AlaGlu Leu Ser Ser Ser Leu Ser Ser 655 660 665 CCT TCT TAC AGA TTT CAA GGGATC CTG CCC AGC AGT GGA TCT GAA GAT 2244 Pro Ser Tyr Arg Phe Gln Gly IleLeu Pro Ser Ser Gly Ser Glu Asp 670 675 680 685 TTG TTG TGG AAA TGG GATGCT TTA AAG GCT TAT CAG AAC TTT GTT TTT 2292 Leu Leu Trp Lys Trp Asp AlaLeu Lys Ala Tyr Gln Asn Phe Val Phe 690 695 700 GAA GAA GAC AGA TTA CATAGT ATA CAG ATT GCA AAT CAC ATT TGC AAT 2340 Glu Glu Asp Arg Leu His SerIle Gln Ile Ala Asn His Ile Cys Asn 705 710 715 TTA ATC CAG AAA GGC AATATA GTT GTT CAG TGG AAA TTA TAT AAT TAC 2388 Leu Ile Gln Lys Gly Asn IleVal Val Gln Trp Lys Leu Tyr Asn Tyr 720 725 730 ATA TTT AAT CCT GTG CTCCAA AGA GGA GTT GAA TTA GCA CAT CAT TGT 2436 Ile Phe Asn Pro Val Leu GlnArg Gly Val Glu Leu Ala His His Cys 735 740 745 CAA CAC CTA AGC GTT ACTTCA GCT CAA AGT CAT GTA TGT AGC CAT CAT 2484 Gln His Leu Ser Val Thr SerAla Gln Ser His Val Cys Ser His His 750 755 760 765 AAC CAG TGC TTG CCTCAG GAC GTG CTT CAG ATT TAT GTA AAA ACT CTG 2532 Asn Gln Cys Leu Pro GlnAsp Val Leu Gln Ile Tyr Val Lys Thr Leu 770 775 780 CCT ATC CTG CTT AAATCC AGG GTA ATA AGA GAT TTG TTT TTG AGT TGT 2580 Pro Ile Leu Leu Lys SerArg Val Ile Arg Asp Leu Phe Leu Ser Cys 785 790 795 AAT GGA GTA AGT CAAATA ATC GAA TTA AAT TGC TTA AAT GGT ATT CGA 2628 Asn Gly Val Ser Gln IleIle Glu Leu Asn Cys Leu Asn Gly Ile Arg 800 805 810 AGT CAT TCT CTA AAAGCA TTT GAA ACT CTG ATA ATC AGC CTA GGG GAG 2676 Ser His Ser Leu Lys AlaPhe Glu Thr Leu Ile Ile Ser Leu Gly Glu 815 820 825 CAA CAG AAA GAT GCCTCA GTT CCA GAT ATT GAT GGG ATA GAC ATT GAA 2724 Gln Gln Lys Asp Ala SerVal Pro Asp Ile Asp Gly Ile Asp Ile Glu 830 835 840 845 CAG AAG GAG TTGTCC TCT GTA CAT GTG GGT ACT TCT TTT CAT CAT CAG 2772 Gln Lys Glu Leu SerSer Val His Val Gly Thr Ser Phe His His Gln 850 855 860 CAA GCT TAT TCAGAT TCT CCT CAG AGT CTC AGC AAA TTT TAT GCT GGC 2820 Gln Ala Tyr Ser AspSer Pro Gln Ser Leu Ser Lys Phe Tyr Ala Gly 865 870 875 CTC AAA GAA GCTTAT CCA AAG AGA CGG AAG ACT GTT AAC CAA GAT GTT 2868 Leu Lys Glu Ala TyrPro Lys Arg Arg Lys Thr Val Asn Gln Asp Val 880 885 890 CAT ATC AAC ACAATA AAC CTA TTC CTC TGT GTG GCT TTT TTA TGC GTA 2916 His Ile Asn Thr IleAsn Leu Phe Leu Cys Val Ala Phe Leu Cys Val 895 900 905 AGT AAA GAA GCAGAG TCT GAC AGG GAG TCG GCC AAT GAC TCA GAA GAT 2964 Ser Lys Glu Ala GluSer Asp Arg Glu Ser Ala Asn Asp Ser Glu Asp 910 915 920 925 ACT TCT GGCTAT GAC AGC ACA GCC AGC GAG CCT TTA AGT CAT ATG CTG 3012 Thr Ser Gly TyrAsp Ser Thr Ala Ser Glu Pro Leu Ser His Met Leu 930 935 940 CCA TGT ATATCT CTC GAG AGC CTT GTC TTG CCT TCT CCT GAA CAT ATG 3060 Pro Cys Ile SerLeu Glu Ser Leu Val Leu Pro Ser Pro Glu His Met 945 950 955 CAC CAA GCAGCA GAC ATT TGG TCT ATG TGT CGT TGG ATC TAC ATG TTG 3108 His Gln Ala AlaAsp Ile Trp Ser Met Cys Arg Trp Ile Tyr Met Leu 960 965 970 AGT TCA GTGTTC CAG AAA CAG TTT TAT AGG CTT GGT GGT TTC CGA GTA 3156 Ser Ser Val PheGln Lys Gln Phe Tyr Arg Leu Gly Gly Phe Arg Val 975 980 985 TGC CAT AAGTTA ATA TTT ATG ATA ATA CAG AAA CTG TTC AGA AGT CAC 3204 Cys His Lys LeuIle Phe Met Ile Ile Gln Lys Leu Phe Arg Ser His 990 995 1000 1005 AAAGAG GAG CAA GGA AAA AAG GAG GGA GAT ACA AGT GTA AAT GAA AAC 3252 Lys GluGlu Gln Gly Lys Lys Glu Gly Asp Thr Ser Val Asn Glu Asn 1010 1015 1020CAG GAT TTA AAC AGA ATT TCT CAA CCT AAG AGA ACT ATG AAG GAA GAT 3300 GlnAsp Leu Asn Arg Ile Ser Gln Pro Lys Arg Thr Met Lys Glu Asp 1025 10301035 TTA TTA TCT TTG GCT ATA AAA AGT GAC CCC ATA CCA TCA GAA CTA GGT3348 Leu Leu Ser Leu Ala Ile Lys Ser Asp Pro Ile Pro Ser Glu Leu Gly1040 1045 1050 AGT CTA AAA AAG AGT GCT GAC AGT TTA GGT AAA TTA GAG TTACAG CAT 3396 Ser Leu Lys Lys Ser Ala Asp Ser Leu Gly Lys Leu Glu Leu GlnHis 1055 1060 1065 ATT TCT TCC ATA AAT GTG GAA GAA GTT TCA GCT ACT GAAGCC GCT CCC 3444 Ile Ser Ser Ile Asn Val Glu Glu Val Ser Ala Thr Glu AlaAla Pro 1070 1075 1080 1085 GAG GAA GCA AAG CTA TTT ACA AGT CAA GAA AGTGAG ACC TCA CTT CAA 3492 Glu Glu Ala Lys Leu Phe Thr Ser Gln Glu Ser GluThr Ser Leu Gln 1090 1095 1100 AGT ATA CGA CTT TTG GAA GCC CTT CTG GCCATT TGT CTT CAT GGT GCC 3540 Ser Ile Arg Leu Leu Glu Ala Leu Leu Ala IleCys Leu His Gly Ala 1105 1110 1115 AGA ACT AGT CAA CAG AAG ATG GAA TTGGAG TTA CCT AAT CAG AAC TTG 3588 Arg Thr Ser Gln Gln Lys Met Glu Leu GluLeu Pro Asn Gln Asn Leu 1120 1125 1130 TCT GTG GAA AGT ATA TTA TTT GAAATG AGG GAC CAT CTT TCC CAG TCA 3636 Ser Val Glu Ser Ile Leu Phe Glu MetArg Asp His Leu Ser Gln Ser 1135 1140 1145 AAG GTG ATT GAA ACA CAA CTAGCA AAG CCG TTA TTT GAT GCC CTG CTT 3684 Lys Val Ile Glu Thr Gln Leu AlaLys Pro Leu Phe Asp Ala Leu Leu 1150 1155 1160 1165 CGA GTT GCC CTC GGGAAT TAT TCA GCA GAT TTT GAA CAT AAT GAT GCT 3732 Arg Val Ala Leu Gly AsnTyr Ser Ala Asp Phe Glu His Asn Asp Ala 1170 1175 1180 ATG ACT GAG AAGAGT CAT CAA TCT GCA GAA GAA TTG TCA TCC CAG CCT 3780 Met Thr Glu Lys SerHis Gln Ser Ala Glu Glu Leu Ser Ser Gln Pro 1185 1190 1195 GGT GAT TTTTCA GAA GAA GCT GAG GAT TCT CAG TGT TGT AGT TTT AAA 3828 Gly Asp Phe SerGlu Glu Ala Glu Asp Ser Gln Cys Cys Ser Phe Lys 1200 1205 1210 CTT TTAGTT GAA GAA GAA GGT TAC GAA GCA GAT AGT GAA AGC AAT CCT 3876 Leu Leu ValGlu Glu Glu Gly Tyr Glu Ala Asp Ser Glu Ser Asn Pro 1215 1220 1225 GAAGAT GGC GAA ACC CAG GAT GAT GGG GTA GAC TTA AAG TCT GAA ACA 3924 Glu AspGly Glu Thr Gln Asp Asp Gly Val Asp Leu Lys Ser Glu Thr 1230 1235 12401245 GAA GGT TTC AGT GCA TCA AGC AGT CCA AAT GAC TTA CTC GAA AAC CTC3972 Glu Gly Phe Ser Ala Ser Ser Ser Pro Asn Asp Leu Leu Glu Asn Leu1250 1255 1260 ACT CAA GGG GAA ATA ATT TAT CCT GAG ATT TGT ATG CTG GAATTA AAT 4020 Thr Gln Gly Glu Ile Ile Tyr Pro Glu Ile Cys Met Leu Glu LeuAsn 1265 1270 1275 TTG CTT TCT GCT AGT AAA GCC AAA CTT GAT GTG CTT GCCCAT GTA TTT 4068 Leu Leu Ser Ala Ser Lys Ala Lys Leu Asp Val Leu Ala HisVal Phe 1280 1285 1290 GAG AGT TTT TTG AAA ATT ATT AGG CAG AAA GAA AAGAAT GTT TTT CTG 4116 Glu Ser Phe Leu Lys Ile Ile Arg Gln Lys Glu Lys AsnVal Phe Leu 1295 1300 1305 CTC ATG CAA CAG GGA ACT GTG AAA AAT CTT TTAGGA GGG TTC TTG AGT 4164 Leu Met Gln Gln Gly Thr Val Lys Asn Leu Leu GlyGly Phe Leu Ser 1310 1315 1320 1325 ATT TTA ACA CAG GAT GAT TCT GAT TTTCAA GCA TGC CAG AGA GTA TTG 4212 Ile Leu Thr Gln Asp Asp Ser Asp Phe GlnAla Cys Gln Arg Val Leu 1330 1335 1340 GTG GAT CTT TTG GTA TCT TTG ATGAGT TCA AGA ACA TGT TCA GAA GAG 4260 Val Asp Leu Leu Val Ser Leu Met SerSer Arg Thr Cys Ser Glu Glu 1345 1350 1355 CTA ACC CTT CTT TTG AGA ATATTT CTG GAG AAA TCT CCT TGT ACA AAA 4308 Leu Thr Leu Leu Leu Arg Ile PheLeu Glu Lys Ser Pro Cys Thr Lys 1360 1365 1370 ATT CTT CTT CTG GGT ATTCTG AAA ATT ATT GAA AGT GAT ACT ACT ATG 4356 Ile Leu Leu Leu Gly Ile LeuLys Ile Ile Glu Ser Asp Thr Thr Met 1375 1380 1385 AGC CCT TCA CAG TATCTA ACC TTC CCT TTA CTG CAC GCT CCA AAT TTA 4404 Ser Pro Ser Gln Tyr LeuThr Phe Pro Leu Leu His Ala Pro Asn Leu 1390 1395 1400 1405 AGC AAC GGTGTT TCA TCA CAA AAG TAT CCT GGG ATT TTA AAC AGT AAG 4452 Ser Asn Gly ValSer Ser Gln Lys Tyr Pro Gly Ile Leu Asn Ser Lys 1410 1415 1420 GCC ATGGGT TTA TTG AGA AGA GCA CGA GTT TCA CGG AGC AAG AAA GAG 4500 Ala Met GlyLeu Leu Arg Arg Ala Arg Val Ser Arg Ser Lys Lys Glu 1425 1430 1435 GCTGAT AGA GAG AGT TTT CCC CAT CGG CTG CTT TCA TCT TGG CAC ATA 4548 Ala AspArg Glu Ser Phe Pro His Arg Leu Leu Ser Ser Trp His Ile 1440 1445 1450GCC CCA GTC CAC CTG CCG TTG CTG GGG CAA AAC TGC TGG CCA CAC CTA 4596 AlaPro Val His Leu Pro Leu Leu Gly Gln Asn Cys Trp Pro His Leu 1455 14601465 TCA GAA GGT TTC AGT GTT TCC CTG TGG TTT AAT GTG GAG TGT ATC CAT4644 Ser Glu Gly Phe Ser Val Ser Leu Trp Phe Asn Val Glu Cys Ile His1470 1475 1480 1485 GAA GCT GAG AGT ACT ACA GAA AAA GGA AAG AAG ATA AAGAAA AGA AAC 4692 Glu Ala Glu Ser Thr Thr Glu Lys Gly Lys Lys Ile Lys LysArg Asn 1490 1495 1500 AAA TCA TTA ATT TTA CCA GAT AGC AGT TTT GAT GGTACA GAG AGC GAC 4740 Lys Ser Leu Ile Leu Pro Asp Ser Ser Phe Asp Gly ThrGlu Ser Asp 1505 1510 1515 AGA CCA GAA GGT GCA GAG TAC ATA AAT CCT GGTGAA AGA CTC ATA GAA 4788 Arg Pro Glu Gly Ala Glu Tyr Ile Asn Pro Gly GluArg Leu Ile Glu 1520 1525 1530 GAA GGA TGT ATT CAT ATA ATT TCA CTG GGATCC AAA GCG TTG ATG ATC 4836 Glu Gly Cys Ile His Ile Ile Ser Leu Gly SerLys Ala Leu Met Ile 1535 1540 1545 CAA GTG TGG GCT GAT CCC CAC AAT GCCACT CTT ATC TTT CGT GTG TGC 4884 Gln Val Trp Ala Asp Pro His Asn Ala ThrLeu Ile Phe Arg Val Cys 1550 1555 1560 1565 ATG GAT TCA AAT GAT GAC ATGAAA GCT GTT TTA CTA GCA CAG GTT GAA 4932 Met Asp Ser Asn Asp Asp Met LysAla Val Leu Leu Ala Gln Val Glu 1570 1575 1580 TCA CAG GAG AAT ATT TTCCTC CCA AGC AAA TGG CAA CAT TTA GTA CTC 4980 Ser Gln Glu Asn Ile Phe LeuPro Ser Lys Trp Gln His Leu Val Leu 1585 1590 1595 ACC TAC TTA CAG CAGCCC CAA GGG AAA AGG AGG ATT CAT GGG AAA ATC 5028 Thr Tyr Leu Gln Gln ProGln Gly Lys Arg Arg Ile His Gly Lys Ile 1600 1605 1610 TCC ATA TGG GTCTCT GGA CAG AGG AAG CCT GAT GTT ACT TTG GAT TTT 5076 Ser Ile Trp Val SerGly Gln Arg Lys Pro Asp Val Thr Leu Asp Phe 1615 1620 1625 ATG CTT CCAAGA AAA ACA AGT TTG TCA TCT GAT AGC AAT AAA ACA TTT 5124 Met Leu Pro ArgLys Thr Ser Leu Ser Ser Asp Ser Asn Lys Thr Phe 1630 1635 1640 1645 TGCATG ATT GGC CAT TGT TTA TCA TCC CAA GAA GAG TTT TTG CAG TTG 5172 Cys MetIle Gly His Cys Leu Ser Ser Gln Glu Glu Phe Leu Gln Leu 1650 1655 1660GCT GGA AAA TGG GAC CTG GGA AAT TTG CTT CTC TTC AAC GGA GCT AAG 5220 AlaGly Lys Trp Asp Leu Gly Asn Leu Leu Leu Phe Asn Gly Ala Lys 1665 16701675 GTT GGT TCA CAA GAG GCC TTT TAT CTG TAT GCT TGT GGA CCC AAC CAT5268 Val Gly Ser Gln Glu Ala Phe Tyr Leu Tyr Ala Cys Gly Pro Asn His1680 1685 1690 ACA TCT GTA ATG CCA TGT AAG TAT GGC AAG CCA GTC AAT GACTAC TCC 5316 Thr Ser Val Met Pro Cys Lys Tyr Gly Lys Pro Val Asn Asp TyrSer 1695 1700 1705 AAA TAT ATT AAT AAA GAA ATT TTG CGA TGT GAA CAA ATCAGA GAA CTT 5364 Lys Tyr Ile Asn Lys Glu Ile Leu Arg Cys Glu Gln Ile ArgGlu Leu 1710 1715 1720 1725 TTT ATG ACC AAG AAA GAT GTG GAT ATT GGT CTCTTA ATT GAA AGT CTT 5412 Phe Met Thr Lys Lys Asp Val Asp Ile Gly Leu LeuIle Glu Ser Leu 1730 1735 1740 TCA GTT GTT TAT ACA ACT TAC TGT CCT GCTCAG TAT ACC ATC TAT GAA 5460 Ser Val Val Tyr Thr Thr Tyr Cys Pro Ala GlnTyr Thr Ile Tyr Glu 1745 1750 1755 CCA GTG ATT AGA CTT AAA GGT CAA ATGAAA ACC CAA CTC TCT CAA AGA 5508 Pro Val Ile Arg Leu Lys Gly Gln Met LysThr Gln Leu Ser Gln Arg 1760 1765 1770 CCC TTC AGC TCA AAA GAA GTT CAGAGC ATC TTA TTA GAA CCT CAT CAT 5556 Pro Phe Ser Ser Lys Glu Val Gln SerIle Leu Leu Glu Pro His His 1775 1780 1785 CTA AAA AAT CTC CAA CCT ACTGAA TAT AAA ACT ATT CAA GGC ATT CTG 5604 Leu Lys Asn Leu Gln Pro Thr GluTyr Lys Thr Ile Gln Gly Ile Leu 1790 1795 1800 1805 CAC GAA ATT GGT GGAACT GGC ATA TTT GTT TTT CTC TTT GCC AGG GTT 5652 His Glu Ile Gly Gly ThrGly Ile Phe Val Phe Leu Phe Ala Arg Val 1810 1815 1820 GTT GAA CTC AGTAGC TGT GAA GAA ACT CAA GCA TTA GCA CTG CGA GTT 5700 Val Glu Leu Ser SerCys Glu Glu Thr Gln Ala Leu Ala Leu Arg Val 1825 1830 1835 ATA CTC TCATTA ATT AAA TAC AAC CAA CAA AGA GTA CAT GAA TTA GAA 5748 Ile Leu Ser LeuIle Lys Tyr Asn Gln Gln Arg Val His Glu Leu Glu 1840 1845 1850 AAT TGTAAT GGA CTT TCT ATG ATT CAT CAG GTG TTG ATC AAA CAA AAA 5796 Asn Cys AsnGly Leu Ser Met Ile His Gln Val Leu Ile Lys Gln Lys 1855 1860 1865 TGCATT GTT GGG TTT TAC ATT TTG AAG ACC CTT CTT GAA GGA TGC TGT 5844 Cys IleVal Gly Phe Tyr Ile Leu Lys Thr Leu Leu Glu Gly Cys Cys 1870 1875 18801885 GGT GAA GAT ATT ATT TAT ATG AAT GAG AAT GGA GAG TTT AAG TTG GAT5892 Gly Glu Asp Ile Ile Tyr Met Asn Glu Asn Gly Glu Phe Lys Leu Asp1890 1895 1900 GTA GAC TCT AAT GCT ATA ATC CAA GAT GTT AAG CTG TTA GAGGAA CTA 5940 Val Asp Ser Asn Ala Ile Ile Gln Asp Val Lys Leu Leu Glu GluLeu 1905 1910 1915 TTG CTT GAC TGG AAG ATA TGG AGT AAA GCA GAG CAA GGTGTT TGG GAA 5988 Leu Leu Asp Trp Lys Ile Trp Ser Lys Ala Glu Gln Gly ValTrp Glu 1920 1925 1930 ACT TTG CTA GCA GCT CTA GAA GTC CTC ATC AGA GCAGAT CAC CAC CAG 6036 Thr Leu Leu Ala Ala Leu Glu Val Leu Ile Arg Ala AspHis His Gln 1935 1940 1945 CAG ATG TTT AAT ATT AAG CAG TTA TTG AAA GCTCAA GTG GTT CAT CAC 6084 Gln Met Phe Asn Ile Lys Gln Leu Leu Lys Ala GlnVal Val His His 1950 1955 1960 1965 TTT CTA CTG ACT TGT CAG GTT TTG CAGGAA TAC AAA GAG GGG CAA CTC 6132 Phe Leu Leu Thr Cys Gln Val Leu Gln GluTyr Lys Glu Gly Gln Leu 1970 1975 1980 ACA CCC ATG CCC CGA GAG GTT TGTAGA TCA TTT GTG AAA ATT ATA GCA 6180 Thr Pro Met Pro Arg Glu Val Cys ArgSer Phe Val Lys Ile Ile Ala 1985 1990 1995 GAA GTC CTT GGA TCT CCT CCAGAT TTG GAA TTA TTG ACA ATT ATC TTC 6228 Glu Val Leu Gly Ser Pro Pro AspLeu Glu Leu Leu Thr Ile Ile Phe 2000 2005 2010 AAT TTC CTT TTA GCA GTTCAC CCT CCT ACT AAT ACT TAC GTT TGT CAC 6276 Asn Phe Leu Leu Ala Val HisPro Pro Thr Asn Thr Tyr Val Cys His 2015 2020 2025 AAT CCC ACG AAC TTCTAC TTT TCT TTG CAC ATA GAT GGC AAG ATC TTT 6324 Asn Pro Thr Asn Phe TyrPhe Ser Leu His Ile Asp Gly Lys Ile Phe 2030 2035 2040 2045 CAG GAG AAAGTG CGG TCA ATC ATG TAC CTG AGG CAT TCC AGC AGT GGA 6372 Gln Glu Lys ValArg Ser Ile Met Tyr Leu Arg His Ser Ser Ser Gly 2050 2055 2060 GGA AGGTCC CTT ATG AGC CCT GGA TTT ATG GTA ATA AGC CCA TCT GGT 6420 Gly Arg SerLeu Met Ser Pro Gly Phe Met Val Ile Ser Pro Ser Gly 2065 2070 2075 TTTACT GCT TCA CCA TAT GAA GGA GAG AAT TCC TCT AAT ATT ATT CCA 6468 Phe ThrAla Ser Pro Tyr Glu Gly Glu Asn Ser Ser Asn Ile Ile Pro 2080 2085 2090CAA CAG ATG GCC GCC CAT ATG CTG CGT TCT AGA AGC CTA CCA GCA TTC 6516 GlnGln Met Ala Ala His Met Leu Arg Ser Arg Ser Leu Pro Ala Phe 2095 21002105 CCT ACT TCT TCA CTA CTA ACG CAA TCA CAA AAA CTG ACT GGA AGT TTG6564 Pro Thr Ser Ser Leu Leu Thr Gln Ser Gln Lys Leu Thr Gly Ser Leu2110 2115 2120 2125 GGT TGT AGT ATC GAC AGG TTA CAA AAT ATT GCA GAT ACTTAT GTT GCC 6612 Gly Cys Ser Ile Asp Arg Leu Gln Asn Ile Ala Asp Thr TyrVal Ala 2130 2135 2140 ACC CAA TCA AAG AAA CAA AAT TCT TTG GGG AGT TCCGAC ACA CTG AAA 6660 Thr Gln Ser Lys Lys Gln Asn Ser Leu Gly Ser Ser AspThr Leu Lys 2145 2150 2155 AAA GGC AAA GAG GAC GCA TTC ATC AGT AGC TGTGAG TCT GCA AAA ACT 6708 Lys Gly Lys Glu Asp Ala Phe Ile Ser Ser Cys GluSer Ala Lys Thr 2160 2165 2170 GTT TGT GAA ATG GAA GCT GTC CTC TCA GCCCAG GTC TCT GTC AGT GAT 6756 Val Cys Glu Met Glu Ala Val Leu Ser Ala GlnVal Ser Val Ser Asp 2175 2180 2185 GTC CCA AAG GGA GTG CTG GGA TTT CCAGTG GTC AAA GCA GAT CAT AAA 6804 Val Pro Lys Gly Val Leu Gly Phe Pro ValVal Lys Ala Asp His Lys 2190 2195 2200 2205 CAG TTG GGA GCA GAA CCC AGGTCA GAA GAT GAC AGT CCT GGG GAT GAG 6852 Gln Leu Gly Ala Glu Pro Arg SerGlu Asp Asp Ser Pro Gly Asp Glu 2210 2215 2220 TCC TGC CCA CGC CGA CCTGAT TAC CTA AAG GGA TTG GCC TCC TTC CAG 6900 Ser Cys Pro Arg Arg Pro AspTyr Leu Lys Gly Leu Ala Ser Phe Gln 2225 2230 2235 CGA AGC CAC AGC ACTATT GCA AGC CTT GGG CTA GCT TTT CCT TCA CAG 6948 Arg Ser His Ser Thr IleAla Ser Leu Gly Leu Ala Phe Pro Ser Gln 2240 2245 2250 AAC GGA TCT GCAGCT GTT GGC CGT TGG CCA AGT CTT GTT GAT AGA AAC 6996 Asn Gly Ser Ala AlaVal Gly Arg Trp Pro Ser Leu Val Asp Arg Asn 2255 2260 2265 ACT GAT GATTGG GAA AAC TTT GCC TAT TCT CTT GGT TAT GAG CCA AAT 7044 Thr Asp Asp TrpGlu Asn Phe Ala Tyr Ser Leu Gly Tyr Glu Pro Asn 2270 2275 2280 2285 TACAAC CGA ACT GCA AGT GCT CAC AGT GTA ACT GAA GAC TGT TTG GTA 7092 Tyr AsnArg Thr Ala Ser Ala His Ser Val Thr Glu Asp Cys Leu Val 2290 2295 2300CCT ATA TGC TGT GGA TTA TAT GAA CTC CTA AGT GGG GTT CTT CTT ATC 7140 ProIle Cys Cys Gly Leu Tyr Glu Leu Leu Ser Gly Val Leu Leu Ile 2305 23102315 CTG CCT GAT GTT TTG CTT GAA GAT GTG ATG GAC AAG CTT ATT CAA GCA7188 Leu Pro Asp Val Leu Leu Glu Asp Val Met Asp Lys Leu Ile Gln Ala2320 2325 2330 GAT ACA CTT TTG GTC CTC GTT AAC CAC CCA TCA CCA GCT ATACAA CAA 7236 Asp Thr Leu Leu Val Leu Val Asn His Pro Ser Pro Ala Ile GlnGln 2335 2340 2345 GGT GTT ATT AAA CTA TTA GAT GCA TAT TTT GCT AGA GCATCT AAG GAA 7284 Gly Val Ile Lys Leu Leu Asp Ala Tyr Phe Ala Arg Ala SerLys Glu 2350 2355 2360 2365 CAA AAA GAT AAA TTT CTG AAG AAT CGT GGA TTTTCC TTG CTA GCC AAC 7332 Gln Lys Asp Lys Phe Leu Lys Asn Arg Gly Phe SerLeu Leu Ala Asn 2370 2375 2380 CAG TTG TAT CTT CAT CGA GGA ACT CAA GAATTG TTA GAA TGC TTC ATC 7380 Gln Leu Tyr Leu His Arg Gly Thr Gln Glu LeuLeu Glu Cys Phe Ile 2385 2390 2395 GAA ATG TTC TTT GGT CGA CAT ATT GGCCTT GAT GAA GAA TTT GAT CTG 7428 Glu Met Phe Phe Gly Arg His Ile Gly LeuAsp Glu Glu Phe Asp Leu 2400 2405 2410 GAA GAT GTG AGA AAC ATG GGA TTGTTT CAG AAG TGG TCT GTC ATT CCT 7476 Glu Asp Val Arg Asn Met Gly Leu PheGln Lys Trp Ser Val Ile Pro 2415 2420 2425 ATT CTG GGA CTA ATA GAG ACCTCT CTA TAT GAC AAC ATA CTC TTG CAT 7524 Ile Leu Gly Leu Ile Glu Thr SerLeu Tyr Asp Asn Ile Leu Leu His 2430 2435 2440 2445 AAT GCT CTT TTA CTTCTT CCC CAT CAT GCA GTA GTT CAA AAG CGG AAA 7572 Asn Ala Leu Leu Leu LeuPro His His Ala Val Val Gln Lys Arg Lys 2450 2455 2460 AGC ATT GCT GGTCCT CGA AAA TTT CCC CTT GCT CAA ACT GAA TCG CTT 7620 Ser Ile Ala Gly ProArg Lys Phe Pro Leu Ala Gln Thr Glu Ser Leu 2465 2470 2475 CTG ATG AAAATG CGT TCA GTG GCA AAT GAT GAG CTT CAT GTG ATG ATG 7668 Leu Met Lys MetArg Ser Val Ala Asn Asp Glu Leu His Val Met Met 2480 2485 2490 CAA CGGAGA ATG AGC CAA GAG AAC CCT AGC CAA GCA ACT GAA ACG GAA 7716 Gln Arg ArgMet Ser Gln Glu Asn Pro Ser Gln Ala Thr Glu Thr Glu 2495 2500 2505 CTTGCG CAG AGA CTA CAG AGG CTC ACT GTT TTA GCA GTC AAC AGG ATT 7764 Leu AlaGln Arg Leu Gln Arg Leu Thr Val Leu Ala Val Asn Arg Ile 2510 2515 25202525 ATT TAT CAA GAA TTT AAT TCA GAC ATT ATT GAC ATT TTG AGA ACT CCA7812 Ile Tyr Gln Glu Phe Asn Ser Asp Ile Ile Asp Ile Leu Arg Thr Pro2530 2535 2540 GAA AAT GTA ACT CAA AGC AAG ACC TCA GTT TTC CAG ACC GAAATT TCT 7860 Glu Asn Val Thr Gln Ser Lys Thr Ser Val Phe Gln Thr Glu IleSer 2545 2550 2555 GAG GAA AAT ATT CAT CAT GAA CAG TCT TCT GTT TTC AATCCA TTT CAG 7908 Glu Glu Asn Ile His His Glu Gln Ser Ser Val Phe Asn ProPhe Gln 2560 2565 2570 AAA GAA ATT TTT ACA TAT CTG GTA GAA GGA TTC AAAGTA TCT ATT GGT 7956 Lys Glu Ile Phe Thr Tyr Leu Val Glu Gly Phe Lys ValSer Ile Gly 2575 2580 2585 TCA AGT AAA GCC AGT GGT TCC AAG CAG CAA TGGACT AAA ATT CTG TGG 8004 Ser Ser Lys Ala Ser Gly Ser Lys Gln Gln Trp ThrLys Ile Leu Trp 2590 2595 2600 2605 TCT TGT AAG GAG ACC TTC CGA ATG CAGCTT GGG AGA CTA CTA GTG CAT 8052 Ser Cys Lys Glu Thr Phe Arg Met Gln LeuGly Arg Leu Leu Val His 2610 2615 2620 ATT TTG TCG CCA GCC CAC GCT GCACAA GAG AGA AAG CAA ATT TTT GAA 8100 Ile Leu Ser Pro Ala His Ala Ala GlnGlu Arg Lys Gln Ile Phe Glu 2625 2630 2635 ATA GTT CAT GAA CCA AAT CATCAG GAA ATA CTA CGA GAC TGT CTC AGC 8148 Ile Val His Glu Pro Asn His GlnGlu Ile Leu Arg Asp Cys Leu Ser 2640 2645 2650 CCA TCC CTA CAA CAT GGAGCC AAG TTA GTT TTG TAT TTG TCA GAG TTG 8196 Pro Ser Leu Gln His Gly AlaLys Leu Val Leu Tyr Leu Ser Glu Leu 2655 2660 2665 ATA CAT AAT CAC CAAGGT GAA TTG ACT GAA GAA GAG CTA GGC ACA GCA 8244 Ile His Asn His Gln GlyGlu Leu Thr Glu Glu Glu Leu Gly Thr Ala 2670 2675 2680 2685 GAA CTG CTTATG AAT GCT TTG AAG TTA TGT GGT CAC AAG TGC ATC CCT 8292 Glu Leu Leu MetAsn Ala Leu Lys Leu Cys Gly His Lys Cys Ile Pro 2690 2695 2700 CCC AGTGCA TCA ACA AAA GCA GAC CTT ATT AAA ATG ATC AAA GAG GAA 8340 Pro Ser AlaSer Thr Lys Ala Asp Leu Ile Lys Met Ile Lys Glu Glu 2705 2710 2715 CAAAAG AAA TAT GAA ACT GAA GAA GGA GTG AAT AAA GCT GCT TGG CAG 8388 Gln LysLys Tyr Glu Thr Glu Glu Gly Val Asn Lys Ala Ala Trp Gln 2720 2725 2730AAA ACA GTT AAC AAT AAT CAA CAA AGT CTC TTT CAG CGT CTG GAT TCA 8436 LysThr Val Asn Asn Asn Gln Gln Ser Leu Phe Gln Arg Leu Asp Ser 2735 27402745 AAA TCA AAG GAT ATA TCT AAA ATA GCT GCA GAT ATC ACC CAG GCA GTG8484 Lys Ser Lys Asp Ile Ser Lys Ile Ala Ala Asp Ile Thr Gln Ala Val2750 2755 2760 2765 TCT CTC TCC CAA GGA AAT GAG AGA AAA AAG GTG ATC CAGCAT ATT AGA 8532 Ser Leu Ser Gln Gly Asn Glu Arg Lys Lys Val Ile Gln HisIle Arg 2770 2775 2780 GGA ATG TAT AAA GTA GAT TTG AGT GCC AGC AGA CATTGG CAG GAA CTT 8580 Gly Met Tyr Lys Val Asp Leu Ser Ala Ser Arg His TrpGln Glu Leu 2785 2790 2795 ATT CAG CAG CTG ACA CAT GAT AGA GCA GTA TGGTAT GAC CCC ATC TAC 8628 Ile Gln Gln Leu Thr His Asp Arg Ala Val Trp TyrAsp Pro Ile Tyr 2800 2805 2810 TAT CCA ACC TCA TGG CAG TTG GAT CCA ACAGAA GGG CCA AAT CGA GAG 8676 Tyr Pro Thr Ser Trp Gln Leu Asp Pro Thr GluGly Pro Asn Arg Glu 2815 2820 2825 AGG AGA CGT TTA CAG AGA TGT TAT TTAACT ATT CCA AAT AAG TAT CTC 8724 Arg Arg Arg Leu Gln Arg Cys Tyr Leu ThrIle Pro Asn Lys Tyr Leu 2830 2835 2840 2845 CTT AGG GAT AGA CAG AAA TCAGAA GAT GTT GTC AAA CCA CCA CTC TCT 8772 Leu Arg Asp Arg Gln Lys Ser GluAsp Val Val Lys Pro Pro Leu Ser 2850 2855 2860 TAC CTG TTT GAA GAC AAAACT CAT TCT TCT TTC TCT TCT ACT GTC AAA 8820 Tyr Leu Phe Glu Asp Lys ThrHis Ser Ser Phe Ser Ser Thr Val Lys 2865 2870 2875 GAC AAA GCT GCA AGTGAA TCT ATA AGA GTG AAT CGA AGA TGC ATC AGT 8868 Asp Lys Ala Ala Ser GluSer Ile Arg Val Asn Arg Arg Cys Ile Ser 2880 2885 2890 GTT GCA CCA TCTAGA GAG ACA GCT GGT GAA TTG TTA CTA GGT AAA TGT 8916 Val Ala Pro Ser ArgGlu Thr Ala Gly Glu Leu Leu Leu Gly Lys Cys 2895 2900 2905 GGA ATG TATTTT GTG GAA GAT AAT GCT TCT GAT ACA GTT GAA AGT TCG 8964 Gly Met Tyr PheVal Glu Asp Asn Ala Ser Asp Thr Val Glu Ser Ser 2910 2915 2920 2925 AGCCTT CAG GGA GAG TTG GAA CCA GCA TCA TTT TCC TGG ACA TAT GAA 9012 Ser LeuGln Gly Glu Leu Glu Pro Ala Ser Phe Ser Trp Thr Tyr Glu 2930 2935 2940GAA ATT AAA GAA GTT CAC AAG CGT TGG TGG CAA TTG AGA GAT AAT GCT 9060 GluIle Lys Glu Val His Lys Arg Trp Trp Gln Leu Arg Asp Asn Ala 2945 29502955 GTA GAA ATC TTT CTA ACA AAT GGC AGA ACA CTC CTG TTG GCA TTT GAT9108 Val Glu Ile Phe Leu Thr Asn Gly Arg Thr Leu Leu Leu Ala Phe Asp2960 2965 2970 AAC ACC AAG GTT CGT GAT GAT GTA TAC CAC AAT ATA CTC ACAAAT AAC 9156 Asn Thr Lys Val Arg Asp Asp Val Tyr His Asn Ile Leu Thr AsnAsn 2975 2980 2985 CTC CCT AAT CTT CTG GAA TAT GGT AAC ATC ACC GCT CTGACA AAT TTA 9204 Leu Pro Asn Leu Leu Glu Tyr Gly Asn Ile Thr Ala Leu ThrAsn Leu 2990 2995 3000 3005 TGG TAT ACT GGG CAA ATT ACT AAT TTT GAA TATTTG ACT CAC TTA AAC 9252 Trp Tyr Thr Gly Gln Ile Thr Asn Phe Glu Tyr LeuThr His Leu Asn 3010 3015 3020 AAA CAT GCT GGC CGA TCC TTC AAT GAT CTCATG CAG TAT CCT GTG TTC 9300 Lys His Ala Gly Arg Ser Phe Asn Asp Leu MetGln Tyr Pro Val Phe 3025 3030 3035 CCA TTT ATA CTT GCT GAC TAC GTT AGTGAG ACA CTT GAC CTC AAT GAT 9348 Pro Phe Ile Leu Ala Asp Tyr Val Ser GluThr Leu Asp Leu Asn Asp 3040 3045 3050 CTG TTG ATA TAC AGA AAT CTC TCTAAA CCT ATA GCT GTT CAG TAT AAA 9396 Leu Leu Ile Tyr Arg Asn Leu Ser LysPro Ile Ala Val Gln Tyr Lys 3055 3060 3065 GAA AAA GAA GAT CGT TAT GTGGAC ACA TAC AAG TAC TTG GAG GAA GAG 9444 Glu Lys Glu Asp Arg Tyr Val AspThr Tyr Lys Tyr Leu Glu Glu Glu 3070 3075 3080 3085 TAC CGC AAA GGA GCCAGA GAA GAT GAC CCC ATG CCT CCC GTG CAG CCC 9492 Tyr Arg Lys Gly Ala ArgGlu Asp Asp Pro Met Pro Pro Val Gln Pro 3090 3095 3100 TAT CAC TAT GGCTCC CAC TAT TCC AAT AGC GGC ACT GTG CTT CAC TTC 9540 Tyr His Tyr Gly SerHis Tyr Ser Asn Ser Gly Thr Val Leu His Phe 3105 3110 3115 CTG GTC AGGATG CCT CCT TTC ACT AAA ATG TTT TTA GCC TAT CAA GAT 9588 Leu Val Arg MetPro Pro Phe Thr Lys Met Phe Leu Ala Tyr Gln Asp 3120 3125 3130 CAA AGTTTT GAC ATT CCA GAC AGA ACT TTT CAT TCT ACA AAT ACA ACT 9636 Gln Ser PheAsp Ile Pro Asp Arg Thr Phe His Ser Thr Asn Thr Thr 3135 3140 3145 TGGCGA CTC TCA TCT TTT GAA TCT ATG ACT GAT GTG AAA GAA CTT ATC 9684 Trp ArgLeu Ser Ser Phe Glu Ser Met Thr Asp Val Lys Glu Leu Ile 3150 3155 31603165 CCA GAG TTT TTC TAT CTT CCA GAG TTC CTA GTT AAC CGT GAA GGT TTT9732 Pro Glu Phe Phe Tyr Leu Pro Glu Phe Leu Val Asn Arg Glu Gly Phe3170 3175 3180 GAT TTT GGT GTG CGT CAG AAT GGT GAA CGG GTT AAT CAC GTCAAC CTT 9780 Asp Phe Gly Val Arg Gln Asn Gly Glu Arg Val Asn His Val AsnLeu 3185 3190 3195 CCC CCT TGG GCG CGT AAT GAT CCT CGT CTT TTT ATC CTCATC CAT CGG 9828 Pro Pro Trp Ala Arg Asn Asp Pro Arg Leu Phe Ile Leu IleHis Arg 3200 3205 3210 CAG GCT CTA GAG TCT GAC TAC GTG TCG CAG AAC ATCTGT CAG TGG ATT 9876 Gln Ala Leu Glu Ser Asp Tyr Val Ser Gln Asn Ile CysGln Trp Ile 3215 3220 3225 GAC TTG GTG TTT GGG TAT AAG CAA AAG GGG AAGGCT TCT GTT CAA GCG 9924 Asp Leu Val Phe Gly Tyr Lys Gln Lys Gly Lys AlaSer Val Gln Ala 3230 3235 3240 3245 ATC AAT GTT TTT CAT CCT GCT ACA TATTTT GGA ATG GAT GTC TCT GCA 9972 Ile Asn Val Phe His Pro Ala Thr Tyr PheGly Met Asp Val Ser Ala 3250 3255 3260 GTT GAA GAT CCA GTT CAG AGA CGAGCG CTA GAA ACC ATG ATA AAA ACC 10020 Val Glu Asp Pro Val Gln Arg ArgAla Leu Glu Thr Met Ile Lys Thr 3265 3270 3275 TAC GGG CAG ACT CCC CGTCAG CTG TTC CAC ATG GCC CAT GTG AGC AGA 10068 Tyr Gly Gln Thr Pro ArgGln Leu Phe His Met Ala His Val Ser Arg 3280 3285 3290 CCT GGA GCC AAGCTC AAT ATT GAA GGA GAG CTT CCA GCT GCT GTG GGG 10116 Pro Gly Ala LysLeu Asn Ile Glu Gly Glu Leu Pro Ala Ala Val Gly 3295 3300 3305 TTG CTAGTG CAG TTT GCT TTC AGG GAG ACC CGA GAA CAG GTC AAA GAA 10164 Leu LeuVal Gln Phe Ala Phe Arg Glu Thr Arg Glu Gln Val Lys Glu 3310 3315 33203325 ATC ACC TAT CCG AGT CCT TTG TCA TGG ATA AAA GGC TTG AAA TGG GGG10212 Ile Thr Tyr Pro Ser Pro Leu Ser Trp Ile Lys Gly Leu Lys Trp Gly3330 3335 3340 GAA TAC GTG GGT TCC CCC AGT GCT CCA GTA CCT GTG GTC TGCTTC AGC 10260 Glu Tyr Val Gly Ser Pro Ser Ala Pro Val Pro Val Val CysPhe Ser 3345 3350 3355 CAG CCC CAC GGA GAA AGA TTT GGC TCT CTC CAG GCTCTG CCC ACC AGA 10308 Gln Pro His Gly Glu Arg Phe Gly Ser Leu Gln AlaLeu Pro Thr Arg 3360 3365 3370 GCA ATC TGT GGT TTG TCA CGG AAT TTC TGTCTT GTG ATG ACA TAT AGC 10356 Ala Ile Cys Gly Leu Ser Arg Asn Phe CysLeu Val Met Thr Tyr Ser 3375 3380 3385 AAG GAA CAA GGT GTG AGA AGC ATGAAC AGT ACG GAC ATT CAG TGG TCA 10404 Lys Glu Gln Gly Val Arg Ser MetAsn Ser Thr Asp Ile Gln Trp Ser 3390 3395 3400 3405 GCC ATC CTG AGC TGGGGA TAT GCT GAT AAT ATT TTA AGG TTG AAG AGT 10452 Ala Ile Leu Ser TrpGly Tyr Ala Asp Asn Ile Leu Arg Leu Lys Ser 3410 3415 3420 AAA CAA AGTGAG CCT CCA GTA AAC TTT ATT CAA AGT TCA CAA CAG TAC 10500 Lys Gln SerGlu Pro Pro Val Asn Phe Ile Gln Ser Ser Gln Gln Tyr 3425 3430 3435 CAGGTG ACT AGT TGT GCT TGG GTG CCT GAC AGT TGC CAG CTG TTT ACT 10548 GlnVal Thr Ser Cys Ala Trp Val Pro Asp Ser Cys Gln Leu Phe Thr 3440 34453450 GGA AGC AAA TGC GGT GTC ATC ACA GCC TAC ACA AAC AGA TTT ACA AGC10596 Gly Ser Lys Cys Gly Val Ile Thr Ala Tyr Thr Asn Arg Phe Thr Ser3455 3460 3465 AGC ACG CCA TCA GAA ATA GAA ATG GAG ACT CAA ATA CAT CTCTAT GGT 10644 Ser Thr Pro Ser Glu Ile Glu Met Glu Thr Gln Ile His LeuTyr Gly 3470 3475 3480 3485 CAC ACA GAA GAG ATA ACC AGC TTA TTT GTT TGCAAA CCA TAC AGT ATA 10692 His Thr Glu Glu Ile Thr Ser Leu Phe Val CysLys Pro Tyr Ser Ile 3490 3495 3500 CTG ATA AGT GTG AGC AGA GAC GGA ACCTGC ATC ATA TGG GAT TTA AAC 10740 Leu Ile Ser Val Ser Arg Asp Gly ThrCys Ile Ile Trp Asp Leu Asn 3505 3510 3515 AGG TTA TGC TAT GTA CAA AGTCTG GCG GGA CAC AAA AGC CCT GTC ACA 10788 Arg Leu Cys Tyr Val Gln SerLeu Ala Gly His Lys Ser Pro Val Thr 3520 3525 3530 GCT GTC TCT GCC AGTGAA ACC TCA GGT GAT ATT GCT ACT GTG TGT GAT 10836 Ala Val Ser Ala SerGlu Thr Ser Gly Asp Ile Ala Thr Val Cys Asp 3535 3540 3545 TCA GCT GGCGGA GGC AGT GAC CTC AGA CTC TGG ACG GTG AAC GGG GAT 10884 Ser Ala GlyGly Gly Ser Asp Leu Arg Leu Trp Thr Val Asn Gly Asp 3550 3555 3560 3565CTC GTT GGA CAT GTC CAC TGC AGG GAG ATC ATC TGT TCC GTG GCT TTC 10932Leu Val Gly His Val His Cys Arg Glu Ile Ile Cys Ser Val Ala Phe 35703575 3580 TCC AAC CAG CCT GAG GGA GTA TCT ATC AAT GTA ATC GCT GGG GGATTA 10980 Ser Asn Gln Pro Glu Gly Val Ser Ile Asn Val Ile Ala Gly GlyLeu 3585 3590 3595 GAA AAT GGA ATT GTT AGG TTA TGG AGC ACA TGG GAC TTAAAG CCT GTG 11028 Glu Asn Gly Ile Val Arg Leu Trp Ser Thr Trp Asp LeuLys Pro Val 3600 3605 3610 AGA GAA ATT ACA TTT CCC AAA TCA AAT AAG CCCATC ATC AGC CTT ACA 11076 Arg Glu Ile Thr Phe Pro Lys Ser Asn Lys ProIle Ile Ser Leu Thr 3615 3620 3625 TTT TCT TGT GAT GGC CAC CAT TTG TACACA GCA AAC AGT GAT GGG ACC 11124 Phe Ser Cys Asp Gly His His Leu TyrThr Ala Asn Ser Asp Gly Thr 3630 3635 3640 3645 GTG ATT GCC TGG TGT CGGAAG GAC CAG CAC CGC TTG AAA CAG CCA ATG 11172 Val Ile Ala Trp Cys ArgLys Asp Gln His Arg Leu Lys Gln Pro Met 3650 3655 3660 TTC TAT TCC TTCCTT AGC AGC TAT GCA GCC GGG TGA ATGCGAATGA 11218 Phe Tyr Ser Phe Leu SerSer Tyr Ala Ala Gly * 3665 3670 ACTTCATGTT CTCCAAAGCA CTTTAACTCCAAACTAGATT TGTTGACTTC ACCAGTTTTA 11278 GGAGGTTGAA CCTAAAGAAA TGGATGACTGGACAAACCAT CCAAATAATG ATAAAGTCTA 11338 TTCATCTGCA CAAAATTCTG AAGAGTCACATGATCCTAAG AGGAAAGTTC TGTTCTATTT 11398 TAGTGATAAT CTGGAAGATT GTGTCAATATGCACTAGCCA ACAAGTTTTA AGCCTCGCAT 11458 GGTACATTAA AATGATATTC TTAAAATTTTTTCCCACCAA GGTATTCCAA AGAAAATATT 11518 AAGGTCTCCC CTTTTTCTAT GATTCCAAAAGGACCAGTAG AATTTAAATT GGTTGGTTGA 11578 TGTTTATATA AAACACACTA AAATTATATTTTAAAAGTTT ATGCCTGAAA TACTCCTCCC 11638 ACCACACACA CATGCTCCAA AAGAGGAAAGAAAAAAAGAT AATTTTTAGG ACTTGATAAT 11698 TGCTTTCTTT GAGAAGCAAA TTATTCAGTAGGTGCCTCTG TACCAAATAT TTTATGGAAT 11758 ATCTAAATAC TAAAATAAAC TATGAATGAATCTCAAAATT AGGCAGTTTT TGCCAGTTGC 11818 TTTCTTAGCT CAAAGGAGAA CCAGAATTTTTTTGACAGCC ACAAACAAGA ATACAGGTAT 11878 CTTGGATTTC AGACACATTC TGTTTCTTCATAAAAATTTT ACTTAAAATC TGTAACGCTA 11938 GATATTGACT ATCCTTAGTT GAGTCACTGAGGTTTAAACA CAATGGTAAG TCTTAAAGTC 11998 TGCTATTTAC AGAGCATTGA ATCTGTACCAATTTGCAATA GAAAGCCTTC AGTATGCAAG 12058 AAGTTTGCAT GGGTATTAAG AACACAGCCTAAATAAGGCA TTTGATTAAT CTGCAGGAAG 12118 AATTTTCTTC CCCAAAACAG AATTATAAAAGCTTACTTTA AACAGGAGGC AGAATAATTC 12178 TTTTAGGAAA CCATTTCATT CTGTTTCTACTAACCTATAC CATCTGA 12225 3672 amino acids amino acid unknown protein 12Met Ser Thr Asp Ser Asn Ser Leu Ala Arg Glu Phe Leu Thr Asp Val 1 5 1015 Asn Arg Leu Cys Asn Ala Val Val Gln Arg Val Glu Ala Arg Glu Glu 20 2530 Glu Glu Glu Glu Thr His Met Ala Thr Leu Gly Gln Tyr Leu Val His 35 4045 Gly Arg Gly Phe Leu Leu Leu Thr Lys Leu Asn Ser Ile Ile Asp Gln 50 5560 Ala Leu Thr Cys Arg Glu Glu Leu Leu Thr Leu Leu Leu Ser Leu Leu 65 7075 80 Pro Leu Val Trp Lys Ile Pro Val Gln Glu Glu Lys Ala Thr Asp Phe 8590 95 Asn Leu Pro Leu Ser Ala Asp Ile Ile Leu Thr Lys Glu Lys Asn Ser100 105 110 Ser Ser Gln Arg Ser Thr Gln Glu Lys Leu His Leu Glu Gly SerAla 115 120 125 Leu Ser Ser Gln Val Ser Ala Lys Val Asn Val Phe Arg LysSer Arg 130 135 140 Arg Gln Arg Lys Ile Thr His Arg Tyr Ser Val Arg AspAla Arg Lys 145 150 155 160 Thr Gln Leu Ser Thr Ser Asp Ser Glu Ala AsnSer Asp Glu Lys Gly 165 170 175 Ile Ala Met Asn Lys His Arg Arg Pro HisLeu Leu His His Phe Leu 180 185 190 Thr Ser Phe Pro Lys Gln Asp His ProLys Ala Lys Leu Asp Arg Leu 195 200 205 Ala Thr Lys Glu Gln Thr Pro ProAsp Ala Met Ala Leu Glu Asn Ser 210 215 220 Arg Glu Ile Ile Pro Arg GlnGly Ser Asn Thr Asp Ile Leu Ser Glu 225 230 235 240 Pro Ala Ala Leu SerVal Ile Ser Asn Met Asn Asn Ser Pro Phe Asp 245 250 255 Leu Cys His ValLeu Leu Ser Leu Leu Glu Lys Val Cys Lys Phe Asp 260 265 270 Val Thr LeuAsn His Asn Ser Pro Leu Ala Ala Ser Val Val Pro Thr 275 280 285 Leu ThrGlu Phe Leu Ala Gly Phe Gly Asp Cys Cys Ser Leu Ser Asp 290 295 300 AsnLeu Glu Ser Arg Val Val Ser Ala Gly Trp Thr Glu Glu Pro Val 305 310 315320 Ala Leu Ile Gln Arg Met Leu Phe Arg Thr Val Leu His Leu Leu Ser 325330 335 Val Asp Val Ser Thr Ala Glu Met Met Pro Glu Asn Leu Arg Lys Asn340 345 350 Leu Thr Glu Leu Leu Arg Ala Ala Leu Lys Ile Arg Ile Cys LeuGlu 355 360 365 Lys Gln Pro Asp Pro Phe Ala Pro Arg Gln Lys Lys Thr LeuGln Glu 370 375 380 Val Gln Glu Asp Phe Val Phe Ser Lys Tyr Arg His ArgAla Leu Leu 385 390 395 400 Leu Pro Glu Leu Leu Glu Gly Val Leu Gln IleLeu Ile Cys Cys Leu 405 410 415 Gln Ser Ala Ala Ser Asn Pro Phe Tyr PheSer Gln Ala Met Asp Leu 420 425 430 Val Gln Glu Phe Ile Gln His His GlyPhe Asn Leu Phe Glu Thr Ala 435 440 445 Val Leu Gln Met Glu Trp Leu ValLeu Arg Asp Gly Val Pro Pro Glu 450 455 460 Ala Ser Glu His Leu Lys AlaLeu Ile Asn Ser Val Met Lys Ile Met 465 470 475 480 Ser Thr Val Lys LysVal Lys Ser Glu Gln Leu His His Ser Met Cys 485 490 495 Thr Arg Lys ArgHis Arg Arg Cys Glu Tyr Ser His Phe Met His His 500 505 510 His Arg AspLeu Ser Gly Leu Leu Val Ser Ala Phe Lys Asn Gln Val 515 520 525 Ser LysAsn Pro Phe Glu Glu Thr Ala Asp Gly Asp Val Tyr Tyr Pro 530 535 540 GluArg Cys Cys Cys Ile Ala Val Cys Ala His Gln Cys Leu Arg Leu 545 550 555560 Leu Gln Gln Ala Ser Leu Ser Ser Thr Cys Val Gln Ile Leu Ser Gly 565570 575 Val His Asn Ile Gly Ile Cys Cys Cys Met Asp Pro Lys Ser Val Ile580 585 590 Ile Pro Leu Leu His Ala Phe Lys Leu Pro Ala Leu Lys Asn PheGln 595 600 605 Gln His Ile Leu Asn Ile Leu Asn Lys Leu Ile Leu Asp GlnLeu Gly 610 615 620 Gly Ala Glu Ile Ser Pro Lys Ile Lys Lys Ala Ala CysAsn Ile Cys 625 630 635 640 Thr Val Asp Ser Asp Gln Leu Ala Gln Leu GluGlu Thr Leu Gln Gly 645 650 655 Asn Leu Cys Asp Ala Glu Leu Ser Ser SerLeu Ser Ser Pro Ser Tyr 660 665 670 Arg Phe Gln Gly Ile Leu Pro Ser SerGly Ser Glu Asp Leu Leu Trp 675 680 685 Lys Trp Asp Ala Leu Lys Ala TyrGln Asn Phe Val Phe Glu Glu Asp 690 695 700 Arg Leu His Ser Ile Gln IleAla Asn His Ile Cys Asn Leu Ile Gln 705 710 715 720 Lys Gly Asn Ile ValVal Gln Trp Lys Leu Tyr Asn Tyr Ile Phe Asn 725 730 735 Pro Val Leu GlnArg Gly Val Glu Leu Ala His His Cys Gln His Leu 740 745 750 Ser Val ThrSer Ala Gln Ser His Val Cys Ser His His Asn Gln Cys 755 760 765 Leu ProGln Asp Val Leu Gln Ile Tyr Val Lys Thr Leu Pro Ile Leu 770 775 780 LeuLys Ser Arg Val Ile Arg Asp Leu Phe Leu Ser Cys Asn Gly Val 785 790 795800 Ser Gln Ile Ile Glu Leu Asn Cys Leu Asn Gly Ile Arg Ser His Ser 805810 815 Leu Lys Ala Phe Glu Thr Leu Ile Ile Ser Leu Gly Glu Gln Gln Lys820 825 830 Asp Ala Ser Val Pro Asp Ile Asp Gly Ile Asp Ile Glu Gln LysGlu 835 840 845 Leu Ser Ser Val His Val Gly Thr Ser Phe His His Gln GlnAla Tyr 850 855 860 Ser Asp Ser Pro Gln Ser Leu Ser Lys Phe Tyr Ala GlyLeu Lys Glu 865 870 875 880 Ala Tyr Pro Lys Arg Arg Lys Thr Val Asn GlnAsp Val His Ile Asn 885 890 895 Thr Ile Asn Leu Phe Leu Cys Val Ala PheLeu Cys Val Ser Lys Glu 900 905 910 Ala Glu Ser Asp Arg Glu Ser Ala AsnAsp Ser Glu Asp Thr Ser Gly 915 920 925 Tyr Asp Ser Thr Ala Ser Glu ProLeu Ser His Met Leu Pro Cys Ile 930 935 940 Ser Leu Glu Ser Leu Val LeuPro Ser Pro Glu His Met His Gln Ala 945 950 955 960 Ala Asp Ile Trp SerMet Cys Arg Trp Ile Tyr Met Leu Ser Ser Val 965 970 975 Phe Gln Lys GlnPhe Tyr Arg Leu Gly Gly Phe Arg Val Cys His Lys 980 985 990 Leu Ile PheMet Ile Ile Gln Lys Leu Phe Arg Ser His Lys Glu Glu 995 1000 1005 GlnGly Lys Lys Glu Gly Asp Thr Ser Val Asn Glu Asn Gln Asp Leu 1010 10151020 Asn Arg Ile Ser Gln Pro Lys Arg Thr Met Lys Glu Asp Leu Leu Ser1025 1030 1035 1040 Leu Ala Ile Lys Ser Asp Pro Ile Pro Ser Glu Leu GlySer Leu Lys 1045 1050 1055 Lys Ser Ala Asp Ser Leu Gly Lys Leu Glu LeuGln His Ile Ser Ser 1060 1065 1070 Ile Asn Val Glu Glu Val Ser Ala ThrGlu Ala Ala Pro Glu Glu Ala 1075 1080 1085 Lys Leu Phe Thr Ser Gln GluSer Glu Thr Ser Leu Gln Ser Ile Arg 1090 1095 1100 Leu Leu Glu Ala LeuLeu Ala Ile Cys Leu His Gly Ala Arg Thr Ser 1105 1110 1115 1120 Gln GlnLys Met Glu Leu Glu Leu Pro Asn Gln Asn Leu Ser Val Glu 1125 1130 1135Ser Ile Leu Phe Glu Met Arg Asp His Leu Ser Gln Ser Lys Val Ile 11401145 1150 Glu Thr Gln Leu Ala Lys Pro Leu Phe Asp Ala Leu Leu Arg ValAla 1155 1160 1165 Leu Gly Asn Tyr Ser Ala Asp Phe Glu His Asn Asp AlaMet Thr Glu 1170 1175 1180 Lys Ser His Gln Ser Ala Glu Glu Leu Ser SerGln Pro Gly Asp Phe 1185 1190 1195 1200 Ser Glu Glu Ala Glu Asp Ser GlnCys Cys Ser Phe Lys Leu Leu Val 1205 1210 1215 Glu Glu Glu Gly Tyr GluAla Asp Ser Glu Ser Asn Pro Glu Asp Gly 1220 1225 1230 Glu Thr Gln AspAsp Gly Val Asp Leu Lys Ser Glu Thr Glu Gly Phe 1235 1240 1245 Ser AlaSer Ser Ser Pro Asn Asp Leu Leu Glu Asn Leu Thr Gln Gly 1250 1255 1260Glu Ile Ile Tyr Pro Glu Ile Cys Met Leu Glu Leu Asn Leu Leu Ser 12651270 1275 1280 Ala Ser Lys Ala Lys Leu Asp Val Leu Ala His Val Phe GluSer Phe 1285 1290 1295 Leu Lys Ile Ile Arg Gln Lys Glu Lys Asn Val PheLeu Leu Met Gln 1300 1305 1310 Gln Gly Thr Val Lys Asn Leu Leu Gly GlyPhe Leu Ser Ile Leu Thr 1315 1320 1325 Gln Asp Asp Ser Asp Phe Gln AlaCys Gln Arg Val Leu Val Asp Leu 1330 1335 1340 Leu Val Ser Leu Met SerSer Arg Thr Cys Ser Glu Glu Leu Thr Leu 1345 1350 1355 1360 Leu Leu ArgIle Phe Leu Glu Lys Ser Pro Cys Thr Lys Ile Leu Leu 1365 1370 1375 LeuGly Ile Leu Lys Ile Ile Glu Ser Asp Thr Thr Met Ser Pro Ser 1380 13851390 Gln Tyr Leu Thr Phe Pro Leu Leu His Ala Pro Asn Leu Ser Asn Gly1395 1400 1405 Val Ser Ser Gln Lys Tyr Pro Gly Ile Leu Asn Ser Lys AlaMet Gly 1410 1415 1420 Leu Leu Arg Arg Ala Arg Val Ser Arg Ser Lys LysGlu Ala Asp Arg 1425 1430 1435 1440 Glu Ser Phe Pro His Arg Leu Leu SerSer Trp His Ile Ala Pro Val 1445 1450 1455 His Leu Pro Leu Leu Gly GlnAsn Cys Trp Pro His Leu Ser Glu Gly 1460 1465 1470 Phe Ser Val Ser LeuTrp Phe Asn Val Glu Cys Ile His Glu Ala Glu 1475 1480 1485 Ser Thr ThrGlu Lys Gly Lys Lys Ile Lys Lys Arg Asn Lys Ser Leu 1490 1495 1500 IleLeu Pro Asp Ser Ser Phe Asp Gly Thr Glu Ser Asp Arg Pro Glu 1505 15101515 1520 Gly Ala Glu Tyr Ile Asn Pro Gly Glu Arg Leu Ile Glu Glu GlyCys 1525 1530 1535 Ile His Ile Ile Ser Leu Gly Ser Lys Ala Leu Met IleGln Val Trp 1540 1545 1550 Ala Asp Pro His Asn Ala Thr Leu Ile Phe ArgVal Cys Met Asp Ser 1555 1560 1565 Asn Asp Asp Met Lys Ala Val Leu LeuAla Gln Val Glu Ser Gln Glu 1570 1575 1580 Asn Ile Phe Leu Pro Ser LysTrp Gln His Leu Val Leu Thr Tyr Leu 1585 1590 1595 1600 Gln Gln Pro GlnGly Lys Arg Arg Ile His Gly Lys Ile Ser Ile Trp 1605 1610 1615 Val SerGly Gln Arg Lys Pro Asp Val Thr Leu Asp Phe Met Leu Pro 1620 1625 1630Arg Lys Thr Ser Leu Ser Ser Asp Ser Asn Lys Thr Phe Cys Met Ile 16351640 1645 Gly His Cys Leu Ser Ser Gln Glu Glu Phe Leu Gln Leu Ala GlyLys 1650 1655 1660 Trp Asp Leu Gly Asn Leu Leu Leu Phe Asn Gly Ala LysVal Gly Ser 1665 1670 1675 1680 Gln Glu Ala Phe Tyr Leu Tyr Ala Cys GlyPro Asn His Thr Ser Val 1685 1690 1695 Met Pro Cys Lys Tyr Gly Lys ProVal Asn Asp Tyr Ser Lys Tyr Ile 1700 1705 1710 Asn Lys Glu Ile Leu ArgCys Glu Gln Ile Arg Glu Leu Phe Met Thr 1715 1720 1725 Lys Lys Asp ValAsp Ile Gly Leu Leu Ile Glu Ser Leu Ser Val Val 1730 1735 1740 Tyr ThrThr Tyr Cys Pro Ala Gln Tyr Thr Ile Tyr Glu Pro Val Ile 1745 1750 17551760 Arg Leu Lys Gly Gln Met Lys Thr Gln Leu Ser Gln Arg Pro Phe Ser1765 1770 1775 Ser Lys Glu Val Gln Ser Ile Leu Leu Glu Pro His His LeuLys Asn 1780 1785 1790 Leu Gln Pro Thr Glu Tyr Lys Thr Ile Gln Gly IleLeu His Glu Ile 1795 1800 1805 Gly Gly Thr Gly Ile Phe Val Phe Leu PheAla Arg Val Val Glu Leu 1810 1815 1820 Ser Ser Cys Glu Glu Thr Gln AlaLeu Ala Leu Arg Val Ile Leu Ser 1825 1830 1835 1840 Leu Ile Lys Tyr AsnGln Gln Arg Val His Glu Leu Glu Asn Cys Asn 1845 1850 1855 Gly Leu SerMet Ile His Gln Val Leu Ile Lys Gln Lys Cys Ile Val 1860 1865 1870 GlyPhe Tyr Ile Leu Lys Thr Leu Leu Glu Gly Cys Cys Gly Glu Asp 1875 18801885 Ile Ile Tyr Met Asn Glu Asn Gly Glu Phe Lys Leu Asp Val Asp Ser1890 1895 1900 Asn Ala Ile Ile Gln Asp Val Lys Leu Leu Glu Glu Leu LeuLeu Asp 1905 1910 1915 1920 Trp Lys Ile Trp Ser Lys Ala Glu Gln Gly ValTrp Glu Thr Leu Leu 1925 1930 1935 Ala Ala Leu Glu Val Leu Ile Arg AlaAsp His His Gln Gln Met Phe 1940 1945 1950 Asn Ile Lys Gln Leu Leu LysAla Gln Val Val His His Phe Leu Leu 1955 1960 1965 Thr Cys Gln Val LeuGln Glu Tyr Lys Glu Gly Gln Leu Thr Pro Met 1970 1975 1980 Pro Arg GluVal Cys Arg Ser Phe Val Lys Ile Ile Ala Glu Val Leu 1985 1990 1995 2000Gly Ser Pro Pro Asp Leu Glu Leu Leu Thr Ile Ile Phe Asn Phe Leu 20052010 2015 Leu Ala Val His Pro Pro Thr Asn Thr Tyr Val Cys His Asn ProThr 2020 2025 2030 Asn Phe Tyr Phe Ser Leu His Ile Asp Gly Lys Ile PheGln Glu Lys 2035 2040 2045 Val Arg Ser Ile Met Tyr Leu Arg His Ser SerSer Gly Gly Arg Ser 2050 2055 2060 Leu Met Ser Pro Gly Phe Met Val IleSer Pro Ser Gly Phe Thr Ala 2065 2070 2075 2080 Ser Pro Tyr Glu Gly GluAsn Ser Ser Asn Ile Ile Pro Gln Gln Met 2085 2090 2095 Ala Ala His MetLeu Arg Ser Arg Ser Leu Pro Ala Phe Pro Thr Ser 2100 2105 2110 Ser LeuLeu Thr Gln Ser Gln Lys Leu Thr Gly Ser Leu Gly Cys Ser 2115 2120 2125Ile Asp Arg Leu Gln Asn Ile Ala Asp Thr Tyr Val Ala Thr Gln Ser 21302135 2140 Lys Lys Gln Asn Ser Leu Gly Ser Ser Asp Thr Leu Lys Lys GlyLys 2145 2150 2155 2160 Glu Asp Ala Phe Ile Ser Ser Cys Glu Ser Ala LysThr Val Cys Glu 2165 2170 2175 Met Glu Ala Val Leu Ser Ala Gln Val SerVal Ser Asp Val Pro Lys 2180 2185 2190 Gly Val Leu Gly Phe Pro Val ValLys Ala Asp His Lys Gln Leu Gly 2195 2200 2205 Ala Glu Pro Arg Ser GluAsp Asp Ser Pro Gly Asp Glu Ser Cys Pro 2210 2215 2220 Arg Arg Pro AspTyr Leu Lys Gly Leu Ala Ser Phe Gln Arg Ser His 2225 2230 2235 2240 SerThr Ile Ala Ser Leu Gly Leu Ala Phe Pro Ser Gln Asn Gly Ser 2245 22502255 Ala Ala Val Gly Arg Trp Pro Ser Leu Val Asp Arg Asn Thr Asp Asp2260 2265 2270 Trp Glu Asn Phe Ala Tyr Ser Leu Gly Tyr Glu Pro Asn TyrAsn Arg 2275 2280 2285 Thr Ala Ser Ala His Ser Val Thr Glu Asp Cys LeuVal Pro Ile Cys 2290 2295 2300 Cys Gly Leu Tyr Glu Leu Leu Ser Gly ValLeu Leu Ile Leu Pro Asp 2305 2310 2315 2320 Val Leu Leu Glu Asp Val MetAsp Lys Leu Ile Gln Ala Asp Thr Leu 2325 2330 2335 Leu Val Leu Val AsnHis Pro Ser Pro Ala Ile Gln Gln Gly Val Ile 2340 2345 2350 Lys Leu LeuAsp Ala Tyr Phe Ala Arg Ala Ser Lys Glu Gln Lys Asp 2355 2360 2365 LysPhe Leu Lys Asn Arg Gly Phe Ser Leu Leu Ala Asn Gln Leu Tyr 2370 23752380 Leu His Arg Gly Thr Gln Glu Leu Leu Glu Cys Phe Ile Glu Met Phe2385 2390 2395 2400 Phe Gly Arg His Ile Gly Leu Asp Glu Glu Phe Asp LeuGlu Asp Val 2405 2410 2415 Arg Asn Met Gly Leu Phe Gln Lys Trp Ser ValIle Pro Ile Leu Gly 2420 2425 2430 Leu Ile Glu Thr Ser Leu Tyr Asp AsnIle Leu Leu His Asn Ala Leu 2435 2440 2445 Leu Leu Leu Pro His His AlaVal Val Gln Lys Arg Lys Ser Ile Ala 2450 2455 2460 Gly Pro Arg Lys PhePro Leu Ala Gln Thr Glu Ser Leu Leu Met Lys 2465 2470 2475 2480 Met ArgSer Val Ala Asn Asp Glu Leu His Val Met Met Gln Arg Arg 2485 2490 2495Met Ser Gln Glu Asn Pro Ser Gln Ala Thr Glu Thr Glu Leu Ala Gln 25002505 2510 Arg Leu Gln Arg Leu Thr Val Leu Ala Val Asn Arg Ile Ile TyrGln 2515 2520 2525 Glu Phe Asn Ser Asp Ile Ile Asp Ile Leu Arg Thr ProGlu Asn Val 2530 2535 2540 Thr Gln Ser Lys Thr Ser Val Phe Gln Thr GluIle Ser Glu Glu Asn 2545 2550 2555 2560 Ile His His Glu Gln Ser Ser ValPhe Asn Pro Phe Gln Lys Glu Ile 2565 2570 2575 Phe Thr Tyr Leu Val GluGly Phe Lys Val Ser Ile Gly Ser Ser Lys 2580 2585 2590 Ala Ser Gly SerLys Gln Gln Trp Thr Lys Ile Leu Trp Ser Cys Lys 2595 2600 2605 Glu ThrPhe Arg Met Gln Leu Gly Arg Leu Leu Val His Ile Leu Ser 2610 2615 2620Pro Ala His Ala Ala Gln Glu Arg Lys Gln Ile Phe Glu Ile Val His 26252630 2635 2640 Glu Pro Asn His Gln Glu Ile Leu Arg Asp Cys Leu Ser ProSer Leu 2645 2650 2655 Gln His Gly Ala Lys Leu Val Leu Tyr Leu Ser GluLeu Ile His Asn 2660 2665 2670 His Gln Gly Glu Leu Thr Glu Glu Glu LeuGly Thr Ala Glu Leu Leu 2675 2680 2685 Met Asn Ala Leu Lys Leu Cys GlyHis Lys Cys Ile Pro Pro Ser Ala 2690 2695 2700 Ser Thr Lys Ala Asp LeuIle Lys Met Ile Lys Glu Glu Gln Lys Lys 2705 2710 2715 2720 Tyr Glu ThrGlu Glu Gly Val Asn Lys Ala Ala Trp Gln Lys Thr Val 2725 2730 2735 AsnAsn Asn Gln Gln Ser Leu Phe Gln Arg Leu Asp Ser Lys Ser Lys 2740 27452750 Asp Ile Ser Lys Ile Ala Ala Asp Ile Thr Gln Ala Val Ser Leu Ser2755 2760 2765 Gln Gly Asn Glu Arg Lys Lys Val Ile Gln His Ile Arg GlyMet Tyr 2770 2775 2780 Lys Val Asp Leu Ser Ala Ser Arg His Trp Gln GluLeu Ile Gln Gln 2785 2790 2795 2800 Leu Thr His Asp Arg Ala Val Trp TyrAsp Pro Ile Tyr Tyr Pro Thr 2805 2810 2815 Ser Trp Gln Leu Asp Pro ThrGlu Gly Pro Asn Arg Glu Arg Arg Arg 2820 2825 2830 Leu Gln Arg Cys TyrLeu Thr Ile Pro Asn Lys Tyr Leu Leu Arg Asp 2835 2840 2845 Arg Gln LysSer Glu Asp Val Val Lys Pro Pro Leu Ser Tyr Leu Phe 2850 2855 2860 GluAsp Lys Thr His Ser Ser Phe Ser Ser Thr Val Lys Asp Lys Ala 2865 28702875 2880 Ala Ser Glu Ser Ile Arg Val Asn Arg Arg Cys Ile Ser Val AlaPro 2885 2890 2895 Ser Arg Glu Thr Ala Gly Glu Leu Leu Leu Gly Lys CysGly Met Tyr 2900 2905 2910 Phe Val Glu Asp Asn Ala Ser Asp Thr Val GluSer Ser Ser Leu Gln 2915 2920 2925 Gly Glu Leu Glu Pro Ala Ser Phe SerTrp Thr Tyr Glu Glu Ile Lys 2930 2935 2940 Glu Val His Lys Arg Trp TrpGln Leu Arg Asp Asn Ala Val Glu Ile 2945 2950 2955 2960 Phe Leu Thr AsnGly Arg Thr Leu Leu Leu Ala Phe Asp Asn Thr Lys 2965 2970 2975 Val ArgAsp Asp Val Tyr His Asn Ile Leu Thr Asn Asn Leu Pro Asn 2980 2985 2990Leu Leu Glu Tyr Gly Asn Ile Thr Ala Leu Thr Asn Leu Trp Tyr Thr 29953000 3005 Gly Gln Ile Thr Asn Phe Glu Tyr Leu Thr His Leu Asn Lys HisAla 3010 3015 3020 Gly Arg Ser Phe Asn Asp Leu Met Gln Tyr Pro Val PhePro Phe Ile 3025 3030 3035 3040 Leu Ala Asp Tyr Val Ser Glu Thr Leu AspLeu Asn Asp Leu Leu Ile 3045 3050 3055 Tyr Arg Asn Leu Ser Lys Pro IleAla Val Gln Tyr Lys Glu Lys Glu 3060 3065 3070 Asp Arg Tyr Val Asp ThrTyr Lys Tyr Leu Glu Glu Glu Tyr Arg Lys 3075 3080 3085 Gly Ala Arg GluAsp Asp Pro Met Pro Pro Val Gln Pro Tyr His Tyr 3090 3095 3100 Gly SerHis Tyr Ser Asn Ser Gly Thr Val Leu His Phe Leu Val Arg 3105 3110 31153120 Met Pro Pro Phe Thr Lys Met Phe Leu Ala Tyr Gln Asp Gln Ser Phe3125 3130 3135 Asp Ile Pro Asp Arg Thr Phe His Ser Thr Asn Thr Thr TrpArg Leu 3140 3145 3150 Ser Ser Phe Glu Ser Met Thr Asp Val Lys Glu LeuIle Pro Glu Phe 3155 3160 3165 Phe Tyr Leu Pro Glu Phe Leu Val Asn ArgGlu Gly Phe Asp Phe Gly 3170 3175 3180 Val Arg Gln Asn Gly Glu Arg ValAsn His Val Asn Leu Pro Pro Trp 3185 3190 3195 3200 Ala Arg Asn Asp ProArg Leu Phe Ile Leu Ile His Arg Gln Ala Leu 3205 3210 3215 Glu Ser AspTyr Val Ser Gln Asn Ile Cys Gln Trp Ile Asp Leu Val 3220 3225 3230 PheGly Tyr Lys Gln Lys Gly Lys Ala Ser Val Gln Ala Ile Asn Val 3235 32403245 Phe His Pro Ala Thr Tyr Phe Gly Met Asp Val Ser Ala Val Glu Asp3250 3255 3260 Pro Val Gln Arg Arg Ala Leu Glu Thr Met Ile Lys Thr TyrGly Gln 3265 3270 3275 3280 Thr Pro Arg Gln Leu Phe His Met Ala His ValSer Arg Pro Gly Ala 3285 3290 3295 Lys Leu Asn Ile Glu Gly Glu Leu ProAla Ala Val Gly Leu Leu Val 3300 3305 3310 Gln Phe Ala Phe Arg Glu ThrArg Glu Gln Val Lys Glu Ile Thr Tyr 3315 3320 3325 Pro Ser Pro Leu SerTrp Ile Lys Gly Leu Lys Trp Gly Glu Tyr Val 3330 3335 3340 Gly Ser ProSer Ala Pro Val Pro Val Val Cys Phe Ser Gln Pro His 3345 3350 3355 3360Gly Glu Arg Phe Gly Ser Leu Gln Ala Leu Pro Thr Arg Ala Ile Cys 33653370 3375 Gly Leu Ser Arg Asn Phe Cys Leu Val Met Thr Tyr Ser Lys GluGln 3380 3385 3390 Gly Val Arg Ser Met Asn Ser Thr Asp Ile Gln Trp SerAla Ile Leu 3395 3400 3405 Ser Trp Gly Tyr Ala Asp Asn Ile Leu Arg LeuLys Ser Lys Gln Ser 3410 3415 3420 Glu Pro Pro Val Asn Phe Ile Gln SerSer Gln Gln Tyr Gln Val Thr 3425 3430 3435 3440 Ser Cys Ala Trp Val ProAsp Ser Cys Gln Leu Phe Thr Gly Ser Lys 3445 3450 3455 Cys Gly Val IleThr Ala Tyr Thr Asn Arg Phe Thr Ser Ser Thr Pro 3460 3465 3470 Ser GluIle Glu Met Glu Thr Gln Ile His Leu Tyr Gly His Thr Glu 3475 3480 3485Glu Ile Thr Ser Leu Phe Val Cys Lys Pro Tyr Ser Ile Leu Ile Ser 34903495 3500 Val Ser Arg Asp Gly Thr Cys Ile Ile Trp Asp Leu Asn Arg LeuCys 3505 3510 3515 3520 Tyr Val Gln Ser Leu Ala Gly His Lys Ser Pro ValThr Ala Val Ser 3525 3530 3535 Ala Ser Glu Thr Ser Gly Asp Ile Ala ThrVal Cys Asp Ser Ala Gly 3540 3545 3550 Gly Gly Ser Asp Leu Arg Leu TrpThr Val Asn Gly Asp Leu Val Gly 3555 3560 3565 His Val His Cys Arg GluIle Ile Cys Ser Val Ala Phe Ser Asn Gln 3570 3575 3580 Pro Glu Gly ValSer Ile Asn Val Ile Ala Gly Gly Leu Glu Asn Gly 3585 3590 3595 3600 IleVal Arg Leu Trp Ser Thr Trp Asp Leu Lys Pro Val Arg Glu Ile 3605 36103615 Thr Phe Pro Lys Ser Asn Lys Pro Ile Ile Ser Leu Thr Phe Ser Cys3620 3625 3630 Asp Gly His His Leu Tyr Thr Ala Asn Ser Asp Gly Thr ValIle Ala 3635 3640 3645 Trp Cys Arg Lys Asp Gln His Arg Leu Lys Gln ProMet Phe Tyr Ser 3650 3655 3660 Phe Leu Ser Ser Tyr Ala Ala Gly 3665 367021 base pairs nucleic acid single linear DNA 13 CAGTGGAATG ACCACCAGGC C21 21 base pairs nucleic acid single linear DNA 14 GTTGCAGGCA TGTACCACTAC 21 21 base pairs nucleic acid single linear DNA 15 TATGAACCTACCAAAGCAGA C 21 20 base pairs nucleic acid single linear DNA 16ACTTCGGAAG TAGTTGTCTC 20 20 base pairs nucleic acid single linear DNA 17CAAAGAAAGC GCTCAGAAAC 20 20 base pairs nucleic acid single linear DNA 18AAAGAGGAAA ACCCAAGACT 20 20 base pairs nucleic acid single linear DNA 19CAAAAACAAG ACACCCAAGT 20 20 base pairs nucleic acid single linear DNA 20TGTTGAATTG AGTGTTGTAG 20 20 base pairs nucleic acid single linear DNA 21CCAGCCACAG AATACCATCC 20 20 base pairs nucleic acid single linear DNA 22GGACATACTC TGCTGCCATC 20 21 base pairs nucleic acid single linear DNA 23ACCCCAGAAC TTGAGAAATA G 21 21 base pairs nucleic acid single linear DNA24 TGCTGAGGTG ATAGGTTTAT G 21 20 base pairs nucleic acid single linearDNA 25 ATTGGCTAGT GTGTGCAGAC 20 20 base pairs nucleic acid single linearDNA 26 GAAGCAGATG ACTGAGCAGA 20 22 base pairs nucleic acid single linearDNA 27 TCTTCTTGTC CTGCCTGATG CT 22 21 base pairs nucleic acid singlelinear DNA 28 GTGCTTCACT TCCTCCAGAT C 21 18 base pairs nucleic acidsingle linear DNA 29 GCCTCATTCC AGCGAAGC 18 25 base pairs nucleic acidsingle linear DNA 30 CTGGATAGCA GGTGATGGGT GGTTA 25 21 base pairsnucleic acid single linear DNA 31 TGCTGTGGAT TATATGAACT C 21 21 basepairs nucleic acid single linear DNA 32 GGTCTCTATT AGTCCGAGAA C 21

What is claimed is:
 1. An isolated nucleic acid molecule containing thenucleotide sequence of FIG. 4 (SEQ ID NO:1), FIG. 7 or FIG.
 8. 2. Anisolated nucleic acid molecule capable of complementing a bg mutation,having a nucleotide sequence that: (a) encodes the amino acid sequenceshown in FIG. 4, FIG. 7 or FIG. 8; or (b) hybridizes under stringentconditions to the nucleotide sequence of (a) or to its complement.
 3. Anucleotide vector containing the nucleotide sequence of claim 1 or
 2. 4.An expression vector containing the nucleotide sequence of claim 1 or 2in operative association with a nucleotide regulatory sequence thatcontrols expression of the nucleotide sequence in a host cell.
 5. Theexpression vector of claim 4, wherein said regulatory element isselected from the group consisting of the cytomegalovirus hCMV immediateearly gene, the early or late promoters of SV40 adenovirus, the lacsystem, the trp system, the TAC system, the TRC system, the majoroperator and promoter regions of phage A, the control regions of fd coatprotein, the promoter for 3-phosphoglycerate kinase, the promoters ofacid phosphatase, and the promoters of the yeast α-mating factors.
 6. Agenetically engineered host cell that contains the nucleotide sequenceof claim 1 or
 2. 7. A genetically engineered host cell that contains thenucleotide sequence of claim 1 or 2 in operative association with anucleotide regulatory sequence that controls expression of thenucleotide sequence in the host cell.
 8. An isolated bg protein.
 9. Theisolated bg protein of claim 8, wherein the protein has the amino acidsequence shown in FIG. 4, FIG. 7 or FIG.
 8. 10. An antibody thatimmunospecifically binds the bg protein of claim
 8. 11. A method fordiagnosing intracellular vesicle disorders, in a mammal, comprisingmeasuring bg gene expression in a patient sample.
 12. The method ofclaim 11 in which expression is measured by detecting mRNA transcriptsof the bg gene.
 13. The method of claim 11 in which expression ismeasured by detecting the bg gene product.
 14. A method for diagnosingintracellular vesicle disorders in a mammal, comprising detecting a bggene mutation contained in the genome of the mammal.
 15. The method ofclaim 14 in which the mutation is located in a splice site of the bggene.
 16. The method of claim 11 or 14 wherein the intracellular vesicledisorder is Chediak-Higashi syndrome.
 17. A method for screeningcompounds useful for the treatment of intracellular vesicle disorders,comprising contacting a compound with a cultured cell that expresses thebg gene, and detecting a change in the expression of the bg gene by thecultured cell.
 18. The method of claim 17 wherein the intracellularvesicle disorder is Chediak-Higashi syndrome.
 19. A method for treatingan intracellular vesicle disorder, in a mammal comprising administeringa compound to the mammal that modulates the expression of the bg gene inthe mammal.
 20. The method of claim 19 in which the intracel lularvesicle disorder is Chediak-Higashi syndrome.